Bloom Histogram1Path Selectivity Estimation for XML Data with Updates [chapter]

W WANG, H JIANG, H LU, J YU
2004 Proceedings 2004 VLDB Conference  
Cost-based XML query optimization calls for accurate estimation of the selectivity of path expressions. Some other interactive and internet applications can also benefit from such estimations. While there are a number of estimation techniques proposed in the literature, almost none of them has any guarantee on the estimation accuracy within a given space limit. In addition, most of them assume that the XML data are more or less static, i.e., with few updates. In this paper, we present a
more » ... e present a framework for XML path selectivity estimation in a dynamic context. Specifically, we propose a novel data structure, bloom histogram, to approximate XML path frequency distribution within a small space budget and to estimate the path selectivity accurately with the bloom histogram. We obtain the upper bound of its estimation error and discuss the trade-offs between the accuracy and the space limit. To support updates of bloom histograms efficiently when underlying XML data change, a dynamic summary layer is used to keep exact or more detailed XML path information. We demonstrate through our extensive experiments that the new solution can * achieve significantly higher accuracy with an even smaller space than the previous methods in both static and dynamic environments.
doi:10.1016/b978-012088469-8/50024-3 fatcat:nzghhvupsffybcdxatrdtzhaze