Automated design of multidimensional clustering tables for relational databases [chapter]

Sam S. Lightstone, Bishwaranjan Bhattacharjee
2004 Proceedings 2004 VLDB Conference  
The ability to physically cluster a database table on multiple dimensions is a powerful technique that offers significant performance benefits in many OLAP, warehousing, and decision-support systems. An industrial implementation of this technique for the DB2® Universal Database™ (DB2 UDB) product, called multidimensional clustering (MDC), which co-exists with other classical forms of data storage and indexing methods, was described in VLDB 2003. This paper describes the first published model
more » ... automating the selection of clustering keys in single-dimensional and multidimensional relational databases that use a cell/block storage structure for MDC. For any significant dimensionality (3 or more), the possible solution space is combinatorially complex. The automated MDC design model is based on whatif query cost modeling, data sampling, and a search algorithm for evaluating a large constellation of possible combinations. The model is effective at trading the benefits of potential combinations of clustering keys against data sparsity and performance. It also effectively selects the granularity at which dimensions should be used for clustering (such as week of year versus month of year). We show results from experiments indicating that the model provides design recommendations of comparable quality to those made by human experts. The model has been implemented in the IBM® DB2 UDB for Linux®, UNIX® and Windows® Version 8.2 release.
doi:10.1016/b978-012088469-8.50102-9 dblp:conf/vldb/LightstoneB04 fatcat:acfe4zdcoba2napxj54wjvrtgy