XML queries and constraints, containment and reformulation

Alin Deutsch, Val Tannen
2005 Theoretical Computer Science  
Starting from the XQuery language we define XBind, an XML analog of relational conjunctive queries as well as a related class of XML integrity constraints (dependencies). We identify a fragment of XBind for which containment is decidable, in fact p 2 -complete, and a further fragment for which containment is NP-complete. We extend the containment algorithm to take XML dependencies into account. We give an algorithm for the reformulation of XBind queries under combinations of GAV and LAV XQuery
more » ... iews, as well as additional dependencies. We prove a completeness theorem which guarantees that under certain conditions, our algorithm will find a minimal reformulation if one exists. Moreover, we identify conditions when this algorithm achieves optimal complexity bounds. Our results on containment and reformulation depend on certain restrictions on the query and constraint languages. We calibrate the results by showing that lifting these restrictions significantly changes the complexity of the problems. The first stage of the XQuery semantics is reminiscent of the evaluation of relational conjunctive queries. The analogy is strengthened by the fact that the semantics of XPath expressions [34] consists of unary or binary relations over element (node) identities and/or strings. We are naturally led to a syntax like that of conjunctive queries, but with atoms defined by XPath expressions (in addition to usual relation predicates). For instance, we associate to the query Q in Example 1.1 the following queries: Xb o (a) ← [//author/text()](a), Xb i (a, b, a1, t) ← Xb o (a), [//book](b), [./author/text()](b, a1), [./title](b, t), a = a1. The XPath atoms are understood as relations. For example, [./author/text()](b, $a1) is true iff a1 is the text inside an element (node) tagged author who is a child of the node b. And [//book](b) is true iff b is a child tagged book of some element that is a descendant of the root (in fact, all nodes are descendants of the root). The rest of the semantics is as for conjunctive queries. Hence, Xb o computes the bindings for the outer query, while Xb i computes the bindings for the inner query, for each a in the outer query. Notice that a is also in Xb i 's output, in order to preserve the correlation between variable bindings. We call such queries XBind queries because they fully capture the first stage of XQuery evaluation in which the document is navigated, patterns are matched, and all the bindings for the variables are computed. XBind queries play for us the role of conjunctive queries, with some restrictions on the XPath expressions used, as we shall see. Note that relational conjunctive queries (for binary relation schemas) can be seen straightforwardly as particular cases of XBind queries. But the semantics of XBind queries is more complicated, with more containments/equivalences holding, e.g.:
doi:10.1016/j.tcs.2004.10.032 fatcat:o344trzilvfnbofijl6edmvnti