Optimizing Large-Scale Semi-Naïve Datalog Evaluation in Hadoop [chapter]

Marianne Shaw, Paraschos Koutris, Bill Howe, Dan Suciu
2012 Lecture Notes in Computer Science  
We explore the design and implementation of a scalable Datalog system using Hadoop as the underlying runtime system. Observing that several successful projects provide a relational algebra-based programming interface to Hadoop, we argue that a natural extension is to add recursion to support scalable social network analysis, internet traffic analysis, and general graph query. We implement semi-naive evaluation in Hadoop, then apply a series of optimizations spanning fundamental changes to the
more » ... doop infrastructure to basic configuration guidelines that collectively offer a 10x improvement in our experiments. This work lays the foundation for a more comprehensive cost-based algebraic optimization framework for parallel recursive Datalog queries.
doi:10.1007/978-3-642-32925-8_17 fatcat:dplnz7d4jbfhdpvnrhtan3rixy