Fast Multi-Column Sorting in Main-Memory Column-Stores

Wenjian Xu, Ziqiang Feng, Eric Lo
2016 Proceedings of the 2016 International Conference on Management of Data - SIGMOD '16  
Sorting is a crucial operation that could be used to implement SQL operators such as GROUP BY, ORDER BY, and SQL:2003 PAR-TITION BY. Queries with multiple attributes in those clauses are common in real workloads. When executing queries of that kind, state-of-the-art main-memory column-stores require one round of sorting per input column. With the advent of recent fast scans and denormalization techniques, that kind of multi-column sorting could become a bottleneck. In this paper, we propose a
more » ... w technique called "code massaging", which manipulates the bits across the columns so that the overall sorting time can be reduced by eliminating some rounds of sorting and/or by improving the degree of SIMD data level parallelism. Empirical results show that a mainmemory column-store with code massaging can achieve speedup of up to 4.7X, 4.7X, 4X, and 3.2X on TPC-H, TPC-H skew, TPC-DS, and real workload, respectively.
doi:10.1145/2882903.2915205 dblp:conf/sigmod/XuFL16 fatcat:h2ritugj2fhl3gkr4w7j4wee7y