Data management and query---Estimating query result sizes for proxy caching in scientific database federations

Tanu Malik, Randal Burns, Nitesh V. Chawla, Alex Szalay
2006 Proceedings of the 2006 ACM/IEEE conference on Supercomputing - SC '06  
In a proxy cache for federations of scientific databases it is important to estimate the size of a query before making a caching decision. With accurate estimates, near-optimal cache performance can be obtained. On the other extreme, inaccurate estimates can render the cache totally ineffective. We present classification and regression over templates (CAROT), a general method for estimating query result sizes, which is suited to the resource-limited environment of proxy caches and the
more » ... d nature of database federations. CAROT estimates query result sizes by learning the distribution of query results, not by examining or sampling data, but from observing workload. We have integrated CAROT into the proxy cache of the National Virtual Observatory (NVO) federation of astronomy databases. Experiments conducted in the NVO show that CAROT dramatically outperforms conventional estimation techniques and provides near-optimal cache performance.
doi:10.1145/1188455.1188562 dblp:conf/sc/MalikBCS06 fatcat:6avilhqbkzh6flkugosxtic57u