Fast and accurate computation of equi-depth histograms over data streams

Hamid Mousavi, Carlo Zaniolo
2011 Proceedings of the 14th International Conference on Extending Database Technology - EDBT/ICDT '11  
Equi-depth histograms represent a fundamental synopsis widely used in both database and data stream applications, as they provide the cornerstone of many techniques such as query optimization, approximate query answering, distribution fitting, and parallel database partitioning. Equi-depth histograms try to partition a sequence of data in a way that every part has the same number of data items. In this paper, we present a new algorithm to estimate equi-depth histograms for high speed data
more » ... s over sliding windows. While many previous methods were based on quantile computations, we propose a new method called BAr Splitting Histogram (BASH) that provides an expected ϵ-approximate solution to compute the equi-depth histogram. Extensive experiments show that BASH is at least four times faster than one of the best existing approaches, while achieving similar or better accuracy and in some cases using less memory. The experimental results also indicate that BASH is more stable on data affected by frequent concept shifts.
doi:10.1145/1951365.1951376 dblp:conf/edbt/MousaviZ11 fatcat:qwlwglnrsvfnhkyz3rb26k2rwq