Map-Side Merge Joins for Scalable SPARQL BGP Processing

Martin Przyjaciel-Zablocki, Alexander Schaetzle, Eduard Skaley, Thomas Hornung, Georg Lausen
2013 2013 IEEE 5th International Conference on Cloud Computing Technology and Science  
In recent times, it has been widely recognized that, due to their inherent scalability, frameworks based on MapReduce are indispensable for so-called "Big Data" applications. However, for Semantic Web applications using SPARQL, there is still a demand for sophisticated MapReduce join techniques for processing basic graph patterns, which are at the core of SPARQL. Renowned for their stable and efficient performance, sort-merge joins have become widely used in DBMSs. In this paper, we demonstrate
more » ... the adaptation of merge joins for SPARQL BGP processing with MapReduce. Our technique supports both n-way joins and sequences of join operations by applying merge joins within the map phase of MapReduce while the reduce phase is only used to fulfill the preconditions of a subsequent join iteration. Our experiments with the LUBM benchmark show an average performance benefit between 15% and 48% compared to other MapReduce based approaches while at the same time scaling linearly with the RDF dataset size.
doi:10.1109/cloudcom.2013.9 dblp:conf/cloudcom/Przyjaciel-ZablockiSSHL13 fatcat:72mfthvmnzh73mjxdsksqyera4