Histograms as a side effect of data movement for big data

Zsolt Istvan, Louis Woods, Gustavo Alonso
2014 Proceedings of the 2014 ACM SIGMOD international conference on Management of data - SIGMOD '14  
Histograms are a crucial part of database query planning but their computation is resource-intensive. As a consequence, generating histograms on database tables is typically performed as a batch job, separately from query processing. In this paper, we show how to calculate statistics as a side effect of data movement within a DBMS using a hardware accelerator in the data path. This accelerator analyzes tables as they are transmitted from storage to the processing unit, and provides histograms
more » ... the data retrieved for queries at virtually no extra performance cost. To evaluate our approach, we implemented this accelerator on an FPGA. This prototype calculates histograms faster and with similar or better accuracy than commercial databases. Moreover, the FPGA can provide various types of histograms such as Equidepth, Compressed, or Max-diff on the same input data in parallel, without additional overhead.
doi:10.1145/2588555.2612174 dblp:conf/sigmod/IstvanWA14 fatcat:hkiti7hllfh7dgqtaguww7kgya