Query optimization in compressed database systems

Zhiyuan Chen, Johannes Gehrke, Flip Korn
2001 SIGMOD record  
Over the last decades, improvements in CPU speed have outpaced improvements in main memory and disk access rates by orders of magnitude, enabling the use of data compression techniques to improve the performance of database systems. Previous work describes the benefits of compression for numerical attributes, where data is stored in compressed format on disk. Despite the abundance of stringvalued attributes in relational schemas there is little work on compression for string attributes in a
more » ... base context. Moreover, none of the previous work suitably addresses the role of the query optimizer: During query execution, data is either eagerly decompressed when it is read into main memory, or data lazily stays compressed in main memory and is decompressed on demand only. In this paper, we present an effective approach for database compression based on lightweight, attribute-level compression techniques. We propose a Hierarchical Dictionary Encoding strategy that intelligently selects the most effective compression method for string-valued attributes. We show that eager and lazy decompression strategies produce suboptimal plans for queries involving compressed string attributes. We then formalize the problem of compressionaware query optimization and propose one provably optimal and two fast heuristic algorithms for selecting a query plan for relational schemas with compressed attributes; our algorithms can easily be integrated into existing cost-based query optimizers. Experiments using TPC-H data demonstrate the impact of our string compression methods and show the importance of compression-aware query optimization. Our approach results in up to an order speed up over existing approaches.
doi:10.1145/376284.375692 fatcat:ywafbjusfbhglgpydevo2pdzly