Analysis of Accuracy of Data Reduction Techniques [chapter]

Pedro Furtado, H. Madeira
1999 Lecture Notes in Computer Science  
There is a growing interest in the analysis of data in warehouses. Data warehouses can be extremely large and typical queries frequently take too long to answer. Manageable and portable summaries return interactive response times in exploratory data analysis. Obtaining the best estimates for smaller response times and storage needs is the objective of simple data reduction techniques that usually produce coarse approximations. But because the user is exposed to the approximation returned, it is
more » ... important to determine which queries would not be approximated satisfactorily, in which case either the base data is accessed (if available) or the user is warned. In this paper the accuracy of approximations is determined experimentally for simple data reduction algorithms and several data sets. We show that data cube density and distribution skew are important parameters and large range queries are approximated much more accurately then point or small range queries. We quantify this and other results that should be taken into consideration when incorporating the data reduction techniques into the design.
doi:10.1007/3-540-48298-9_40 fatcat:xiqv7rmuzvbaldn4nvcof3xbaq