Wavelet-based histograms for selectivity estimation

Yossi Matias, Jeffrey Scott Vitter, Min Wang
1998 SIGMOD record  
Query optimization is an integral part of relational database management systems. One important task in query optimization is selectivity estimation. Given a query P , we need to estimate the fraction of records in the database that satisfy P . Many commercial database systems maintain histograms to approximate the frequency distribution of values in the attributes of relations. In this paper, we present a technique based upon a multiresolution wavelet decomposition for building histograms on
more » ... e underlying data distributions. Histograms built on the cumulative data distributions give very good approximations with limited space usage. We give fast algorithms for constructing histograms and using them in an on-line fashion for selectivity estimation. Our histograms can also be used to provide quick approximate answers to OLAP queries when the exact answers are not required. Our method captures the joint distribution of multiple attributes effectively, especially when the attributes are correlated. Experiments confirm that our histograms offer substantial improvements in accuracy over random sampling and other previous approaches.
doi:10.1145/276305.276344 fatcat:fwsu6vzthnhzpafa5jkhpmcdyy