Bindings-Restricted Triple Pattern Fragments [chapter]

Olaf Hartig, Carlos Buil-Aranda
2016 Lecture Notes in Computer Science  
The Triple Pattern Fragment (TPF) interface is a recent proposal for reducing server load in Web-based approaches to execute SPARQL queries over public RDF datasets. The price for less overloaded servers is a higher client-side load and a substantial increase in network load (in terms of both the number of HTTP requests and data transfer). In this paper, we propose a slightly extended interface that allows clients to attach intermediate results to triple pattern requests. The response to such a
more » ... request is expected to contain triples from the underlying dataset that do not only match the given triple pattern (as in the case of TPF), but that are guaranteed to contribute in a join with the given intermediate result. Our hypothesis is that a distributed query execution using this extended interface can reduce the network load (in comparison to a pure TPF-based query execution) without reducing the overall throughput of the client-server system significantly. Our main contribution in this paper is twofold: we empirically verify the hypothesis and provide an extensive experimental comparison of our proposal and TPF. Recent years have witnessed a large and constant growth in the amount of data that is structured based on the data model of the Resource Description Framework (RDF) [5] and made available on the Web through HTTP interfaces [3, 12, 15] . A prevalent (and standardized) type of such interfaces that provides query-based access to RDF data are SPARQL endpoints [6] ; that is, Web services that accept queries written in the SPARQL query language [9] . While a SPARQL endpoint enables users to query its RDF dataset by using the full potential of SPARQL, providing such a comparably complex functionality presents a serious challenge (the evaluation problem of a core fragment of SPARQL has been shown to be PSPACE complete [14]). As a consequence, many public endpoints suffer from frequent downtime; for instance, by monitoring over 400 such endpoint for 27 months, Buil-Aranda et al. show that only 32.3% of the endpoints offer an availability of more than 99%, and 50.4% have an availability of less than 95% [4] . Furthermore, if many client applications start to access such an endpoint concurrently, then the performance of the endpoint (in terms of average query execution times and per-client query throughput) drops significantly [19] . ⋆ This document is an extended preprint of a paper published in the proceedings of the ODBASE 2016 conference [11] . In contrast to the proceedings version, this document contains Appendixes A and B which present additional experimental results.
doi:10.1007/978-3-319-48472-3_48 fatcat:6glsfnbfgjdkzofth3r5g6vxi4