Vectorizing Database Column Scans with Complex Predicates

Thomas Willhalm, Ismail Oukid, Ingo Müller, Franz Faerber
2013 Very Large Data Bases Conference  
The performance of the full table scan is critical for the overall performance of column-store database systems such as the SAP HANA database. Compressing the underlying column data format is both an advantage and a challenge, because it reduces the data volume involved in a scan on one hand and introduces the need for decompression during the scan on the other hand. In previous work [26] we have shown how to accelerate the column-scan with range predicates using SIMD instructions. In this
more » ... , we present a framework for vectorized scans with more complex predicates. One important building block is the In-List predicate, where all rows whose values are contained in a given list of values are selected. While this seems to exhibit only little data parallelism on first sight, we show that a performant vectorized implementation is possible using the new Intel AVX2 instruction set. We also improve our previous algorithms by leveraging the increased vector-width. Finally in a detailed performance evaluation, we show the benefit of these optimizations and of the new instruction set: in almost all cases our scans needs less than one CPU cycle per row including scans with In-List predicate, leading to an overall throughput of 8 billion rows per second and more on a single core.
dblp:conf/vldb/WillhalmO0F13 fatcat:4awda4acsjeajdktmb7rsnzz2m