Efficient processing of RDF graph pattern matching on MapReduce platforms

Padmashree Ravindra, Seokyong Hong, HyeongSik Kim, Kemafor Anyanwu
2011 Proceedings of the second international workshop on Data intensive computing in the clouds - DataCloud-SC '11  
Broadened adoption of the Linking Open Data tenets has led to a significant surge in the amount of Semantic Web data, particularly RDF data. This has positioned the issue of scalable data processing techniques for RDF as a central issue in the Semantic Web research community. The RDF data model is a fine grained model representing relationships as binary relations. Thus, answering queries (typically graph pattern matching queries) over RDF data requires several join operations to reassemble
more » ... ted data. While MapReduce based processing is emerging as the de facto paradigm for processing large scale data, it is known to be inefficient for join-intensive workloads. In addition, most of the existing techniques for optimizing RDF data processing do not transfer well to the MapReduce model and often require significant lead time for pre-processing. Such a requirement may not be desirable for on-demand cloud database scenarios where the goal is to reduce the Time-To-Result (TTR). In this position paper, we argue that some of these challenges can be overcome by rethinking the operators for graph pattern processing, as well as adopting dynamic optimization techniques that exploit information from the previous execution steps in the current execution steps. We present some preliminary evaluation results of the proposed techniques.
doi:10.1145/2087522.2087527 fatcat:fjacm3udwrgrvckodltcszzfsi