A Personalized Smart Cube for Faster and Reliable Access to Data
Daniel K. Antwi, Université D'Ottawa / University Of Ottawa, Université D'Ottawa / University Of Ottawa
2013
Organizations own data sources that contain millions, billions or even trillions of rows and these data are usually highly dimensional in nature. Typically, these raw repositories are comprised of numerous independent data sources that are too big to be copied or joined, with the consequence that aggregations become highly problematic. Data cubes play an essential role in facilitating fast Online Analytical Processing (OLAP) in many multi-dimensional data warehouses. Current data cube
more »
... n techniques have had some success in addressing the above-mentioned aggregation problem. However, the combined problem of reducing data cube size for very large and highly dimensional databases, while guaranteeing fast query response times, has received less attention. Another issue is that most OLAP tools often causes users to be lost in the ocean of data while performing data analysis. Often, most users are interested in only a subset of the data. For example, consider in such a scenario, a business manager who wants to answer the crucial location-related business question. "Why are my sales declining at location X"? This manager wants fast, unambiguous location-aware answers to his queries. He requires access to only the relevant ltered information, as found from the attributes that are directly correlated with his current needs. Therefore, it is important to determine and to extract, only that small data subset that is highly relevant from a particular user's location and perspective. In this thesis, we present the Personalized Smart Cube approach to address the abovementioned scenario. Our approach consists of two main parts. Firstly, we combine vertical partitioning, partial materialization and dynamic computation to drastically reduce the size of the computed data cube while guaranteeing fast query response times. Secondly, our personalization algorithm dynamically monitors user query pattern and creates a personalized data cube for each user. This ensures that users utilize only that small subset of data that is m [...]
doi:10.20381/ruor-3419
fatcat:jzg4ji7hcvflljyqmmpanzqabm