Efficient social network data query processing on MapReduce

Liu Liu, Jiangtao Yin, Lixin Gao
2013 Proceedings of the 5th ACM workshop on HotPlanet - HotPlanet '13  
Social network data analysis becomes increasingly important for business intelligence and online social services. Lots of social network data is presented by Resource Description Framework (RDF). Accordingly, SPARQL, an RDF query language, becomes popular for social network data analysis. As the sizes of social networks expand rapidly, a SPARQL query usually involves a large quantity of data, and thus parallelizing its execution is desirable. MapReduce is a well-known and popular big data
more » ... is tool. However, the state-of-the-art translation from SPARQL queries to MapReduce jobs is not efficient because it mainly follows a two layer rule which needs to transform the SPARQL triple pattern to the standard SQL join. In this paper, we propose two primitives to enable efficient translation from SPARQL queries to MapReduce jobs. We use multiple-join-with-filter to substitute traditional SQL multiple join when feasible, and merge different stages in the query workflow. The evaluation on social network data benchmarks shows that the translation based on these two primitives can achieve up to 2x speedup in query running time comparing to the traditional two layer scheme.
doi:10.1145/2491159.2491169 dblp:conf/sigcomm/LiuYG13 fatcat:c6vn7xqyerh25mbteomhevif7m