Efficient Parallel Processing of Analytical Queries on Linked Data [chapter]

Stefan Hagedorn, Kai-Uwe Sattler
2013 Lecture Notes in Computer Science  
Linked data has become one of the most successful movements of the Semantic Web community. RDF and SPARQL have been established as de-facto standards for representing and querying linked data and there exists quite a number of RDF stores and SPARQL engines that can be used to work with the data. However, for many types of queries on linked data these stores are not the best choice regarding query execution times. For example, users are interested in analytical tasks such as profiling or finding
more » ... correlated entities in their datasets. In this paper we argue that currently available RDF stores are not optimal for such scan-intensive tasks. In order to address this issue, we discuss query evaluation techniques for linked data exploiting the features of modern hardware architectures such as big memory and multi-core processors. Particularly, we describe parallelization techniques as part of our CameLOD system. Furthermore, we compare our system with the well-known linked data stores Virtuoso and RDF-3X by running different analytical queries on the DBpedia dataset and show that we can outperform these systems significantly. Keywords: linked data, parallel query processing, micro benchmark 1 We pronounce it like the mystical British castle Camelot 9 http://wiki.dbpedia.org/Downloads38 10 where type means http://www.w3.org/1999/02/22-rdf-syntax-ns#type
doi:10.1007/978-3-642-41030-7_33 fatcat:6xelv336lbgnfjmq7a4ltt3bnq