XML with data values: typechecking revisited

Noga Alon, Tova Milo, Frank Neven, Dan Suciu, Victor Vianu
2003 Journal of computer and system sciences (Print)  
We investigate the typechecking problem for XML queries: statically verifying that every answer to a query conforms to a given output DTD, for inputs satisfying a given input DTD. This problem had been studied by a subset of the authors in a simplified framework that captured the structure of XML documents but ignored data values. We revisit here the typechecking problem in the more realistic case when data values are present in documents and tested by queries. In this extended framework,
more » ... ecking quickly becomes undecidable. However, it remains decidable for large classes of queries and DTDs of practical interest. The main contribution of the present paper is to trace a fairly tight boundary of decidability for typechecking with data values. The complexity of typechecking in the decidable cases is also considered. r XML documents. The benefits of schemas are numerous. Some are analogous to those derived from schema information in relational query processing. Perhaps most importantly to the context of the Web, schemas can be used to validate data exchange. In a typical scenario, a user community would agree on a common schema and on producing only XML documents which are valid with respect to the specified schema. This raises the issue of (static) typechecking: verifying at compile time that every XML document which is the result of a specified query applied to a valid input document, satisfies the output schema. The typechecking problem takes as input a query and two schemes (or types), one for the input XML documents and one for the output XML documents generated by the query. The goal is to verify whether all the XML documents generated by the query, when applied to documents that conforms to the input type, conform to the output type. In practice, the typechecker is a program module that analyses the query and either accepts or rejects it. One approach to typechecking is type inference, a technique derived from functional programming languages and first adapted to XML by XDuce [15, 16] . Murata also addresses the type inference problem for transformations expressed with certain tree automata [21] . Given program (and possibly an input type), the type inference system constructs a most general output type for that program in a bottom up fashion. Typechecking can then be performed by checking that the inferred type is a subset of the given output type. This approach is quite appealing in practice because typechecking is easy to implement and is extendible to a large class of query languages; for example XQuery uses this approach [11] . However, we showed in a previous paper [19] that any type inference system is incomplete, i.e. it cannot compute the most general output type and, as a consequence, the resulting typechecking algorithm may reject some queries that are correct. In practice this is a serious limitation for XML typecheckers, forcing users to turn the typechecker off (when this is an option), or to rewrite the query in non-obvious ways, in an attempt to overcome the typecheckers limitations. The second approach to typechecking, which we advocated in [19] , is to design specific techniques that are complete for a given query language. We considered a particular class of tree transformations that can be expressed by so-called k-pebble transducer, which we showed to be powerful enough to subsume the tree manipulation core of practical XML query languages, including recursive traversals like in XSLT [7], and described a method for typechecking all transformations in this class. The technique, however, is specific only to the particular language considered, i.e. the class of transformations expressed by k-pebble transducers, and does not extend in obvious ways to other languages. The main limitation of k-pebble transducers is that they do not allow joins between data values, which is a feature found in most query languages. We showed in [19] that type checking becomes undecidable if k-pebble transducers are extended with joins between data values. However, this negative result is not worrisome in itself, because class of transformations defined by k-pebble transducers with joins is more powerful than what is needed in practice. Thus, the results in [19] leave unexplored a large class of queries of significant practical interest: queries that can express joins by comparing data values, but do less powerful tree restructurings than kpebble transducers. This class is precisely where practical declarative query languages lie (XML-QL [9], XQuery [4]) and deserves a thorough investigation. The present paper investigates typechecking of queries with comparisons of data values. We focus on declarative query languages in the style of XML-QL and XQuery and various fragments thereof, with path expressions containing regular expressions, but without recursive functions, N. Alon et al. / Journal of Computer and System Sciences 66 (2003) 688-727 689 * the input does not represent a correct encoding of some non-empty relation R and its projections; * some dependency sAD is violated; * f is satisfied.
doi:10.1016/s0022-0000(03)00032-1 fatcat:lievrskogndkzgl6qtmr7gwntq