From Theory to Practice

Shumo Chu, Magdalena Balazinska, Dan Suciu
2015 Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data - SIGMOD '15  
Big data analytics often requires processing complex queries using massive parallelism, where the main performance metrics is the communication cost incurred during data reshuffling. In this paper, we describe a system that can compute efficiently complex join queries, including queries with cyclic joins, on a massively parallel architecture. We build on two independent lines of work for multi-join query evaluation: a communication-optimal algorithm for distributed evaluation, and a worst-case
more » ... ptimal algorithm for sequential evaluation. We evaluate these algorithms together, then describe novel, practical optimizations for both algorithms.
doi:10.1145/2723372.2750545 dblp:conf/sigmod/ChuBS15 fatcat:ozf3ykax5bea5plbrrp7innvfa