Describing and querying semistructured data: Some expressiveness results
Lecture Notes in Computer Science
mdr@wins, uva. nl VCWA~ home page: http://~a, wins. uva. nl/'mdr Data in traditional relational and object-oriented databases is highly structured and subject to explicit schemas. Lots of data, for example on the world-wide web is only semistruetured. There may be some regularities, but not all data need adhere to it, and the format itself may be subject to frequent change. The important issues in the area of semistructured data are: how to describe (or constrain) semistructured data, and how
... query it. It is generally agreed that the appropriate data model for semistructured data is an edgelabeled graph, but beyond that there are many competing proposals. Various constraint languages and query languages have been proposed, but what is lacking so far are 'sound theoretical foundations, possibly a logic in the style of relational calculus. So, there is a need for more works on calculi for semistructured data and algebraizations of these calculi' ]. One of the main methodological points of this paper is the following. There are many areas in computer science and beyond in which describing and reasoning about finite graphs is a key issue. There exists a large body of work in areas such as feature structures (see, for example, [Rounds 1996]) or process algebra [Baeten, Weijland 1990 ,Milner 1989 which can be usefully applied in database theory. In particular, many results from modal logic are relevant hire. The aim of the present contribution is to map new languages for semistructured data to well-studied formal languages and by doing so characterise their complexity and expressive power. Using the above strategy, we study several languages proposed to express information about the format of semistructured data, namely data guides [Goldman, Widom 1997 ], graph schemas [Buneman et al. 1997] and some classes of path constraints [Abiteboul, Vianu 1997 ]. Among the results we have obtained are the following: Theorem 1. Every set of (graph) databases defined by a data guide is definable by an existential first-order formula. Theorem 2. Every set of (graph) databases conforming to an aeyclie graph schema is definable by a universal formula.