Exploiting statistics on query expressions for optimization

Nicolas Bruno, Surajit Chaudhuri
2002 Proceedings of the 2002 ACM SIGMOD international conference on Management of data - SIGMOD '02  
Statistics play an important role in influencing the plans produced by a query optimizer. Traditionally, optimizers use statistics built over base tables and assume independence between attributes while propagating statistical information through the query plan. This approach can introduce large estimation errors, which may result in the optimizer choosing inefficient execution plans. In this paper, we show how to extend a generic optimizer so that it also exploits statistics built on
more » ... s corresponding to intermediate nodes of query plans. We show that in some cases, the quality of the resulting plans is significantly better than when only basetable statistics are available. Unfortunately, even moderately-sized schemas may have too many relevant candidate statistics. We introduce a workload-driven technique to identify a small subset of statistics that can provide significant benefits over just maintaining base-table statistics. Finally, we present experimental results on an implementation of our approach in Microsoft SQL Server 2000.
doi:10.1145/564720.564722 fatcat:cvw6vyavzzcdlha3a3kpiqnbzq