A Datalog-Based Language for Querying RDF Graphs

Marcelo Arenas, Georg Gottlob, Andreas Pieris
2016 Alberto Mendelzon Workshop on Foundations of Data Management  
RDF is the W3C recommendation data model to represent information about World Wide Web resources, while SPARQL is the standard language for querying RDF data, since its standardization in 2008. One of the distinctive features of Semantic Web data is the existence of vocabularies with predefined semantics: the RDF Schema (RDFS) and the Web Ontology Language (OWL), which can be used to derive logical conclusions from RDF graphs; hence, an RDF query language equipped with reasoning capabilities to
more » ... deal with these vocabularies is desirable. In addition, navigational capabilities are vital for data models with an explicit graph structure such as RDF [1, 3, 9, 15] , while recursive definitions are a key feature for graph query languages [5, 14] . Having an RDF query language available that combines the above key functionalities is of paramount importance for the development of the Semantic Web. This has been recognized by the W3C, which led to the release of SPARQL 1.1 in 2013 [10, 12] , that is, an extended version of the 2008 language with reasoning capabilities to deal with RDFS and OWL vocabularies, and a mechanism to express navigation patterns through regular expressions. However, there are still useful queries that cannot be expressed in SPARQL 1.1, due to the lack of general recursion [14] . To the best of our knowledge, the only language that supports the above features, focussing on the profile OWL 2 QL of OWL 2, while its query evaluation problem is tractable in data complexity, is the recently introduced rule-based language TriQ-Lite, the lite version of the highly expressive triple query (TriQ) language [2]. This language is based on Datalog ∃,¬s,⊥ , that is, Datalog extended with existential quantification in rule-heads, stratified negation, and negative constraints with the falsum (⊥) in ruleheads. Unfortunately, TriQ-Lite suffers from a serious drawback, which may revoke its advantage as an expressive RDF query language, namely it is not a plain language. A query language is called plain if it allows the user to write a query as a single program in a simple non-composite syntax. An example of a plain query language is Datalog, where the user simply needs to define a single Datalog program that captures the intended query. The property of plainness provides conceptual simplicity, which is considered to be a key condition for a query language to be useful in practice. Although TriQ-Lite is based on an extension of Datalog, the way its syntax and semantics are defined significantly deviates from the standard way of defining Datalog-like languages, and thus does not inherit the plainness of Datalog. In fact, TriQ-Lite is a composite language, where the user is forced to split the query in several modules Π 1 , . . . , Π n so that each Π i can be expressed by the fragment of Datalog ∃,¬s,⊥ that is underlying TriQ-Lite, while each pair (Π i , Π i+1 ) is bridged via a set Q i of conjunctive queries. This short paper is based on the recent works [2, 11] .
dblp:conf/amw/ArenasGP16 fatcat:oiwppq73hbfxzj3rnxkpx4py2m