Query Nesting, Assignment, and Aggregation in SPARQL 1.1

Mark Kaminski, Egor V. Kostylev, Bernardo Cuenca Grau
2017 ACM Transactions on Database Systems  
Answering aggregate queries is a key requirement of emerging applications of Semantic Technologies, such as data warehousing, business intelligence and sensor networks. In order to fulfil the requirements of such applications, the standardisation of SPARQL 1.1 led to the introduction of a wide range of constructs that enable value computation, aggregation, and query nesting. In this paper we provide an in-depth formal analysis of the semantics and expressive power of these new constructs as
more » ... ned in the SPARQL 1.1 specification, and hence lay the necessary foundations for the development of robust, scalable and extensible query engines supporting complex numerical and analytics tasks. At first sight, query nesting provides a great deal of power and flexibility to the language. It can lead to sophisticated interactions between set and bag semantics, which may be difficult (or, as we will soon see, impossible) to simulate within plain Sparql. Furthermore, subquery nesting can be arbitrarily deep, and it is reasonable to expect that each additional level of nesting may increase the expressive power of the language. We next argue that Sparql PD queries can be brought into a normal form where the nesting depth is bounded by two; thus, there is a natural bound on the level of nesting after which no further increase in expressive power can be achieved. This normal form is defined next, and one can check that query (Q1) satisfies its requirements. Definition 3.8. A Sparql PD query is in s-normal form if it has either the form Distinct (Project (X , P )) with P subquery-free, or the form Project (X , P ) where all subquery patterns in P are of the form Distinct (Project (X ′ , P ′ )) with P ′ subquery-free. This normal form not only limits the nesting depth, but also restricts the ways in which Project and Distinct can be combined. If a query Q is in Sparql P (or in Sparql D ) and hence Distinct (respectively, Project) only occurs in the outermost level, then Definition 3.8 requires that pattern P is subqueryfree and hence Q is a Sparql query. We next show that each Sparql PD query can be brought into s-normal form. The normalisation is based on two ideas that are illustrated in the following example. Example 3.9. For the first idea, consider the Sparql D query
doi:10.1145/3083898 fatcat:76iwfmwenreufej3mszesjcijq