Two-variable logic on data trees and XML reasoning

Mikoaj Bojańczyk, Anca Muscholl, Thomas Schwentick, Luc Segoufin
2009 Journal of the ACM  
Motivated by reasoning tasks for XML languages, the satisfiability problem of logics on data trees is investigated. The nodes of a data tree have a label from a finite set and a data value from a possibly infinite set. It is shown that satisfiability for two-variable first-order logic is decidable if the tree structure can be accessed only through the child and the next sibling predicates and the access to data values is restricted to equality tests. From this main result, decidability of
more » ... iability and containment for a data-aware fragment of XPath and of the implication problem for unary key and inclusion constraints is concluded. · ments of Core-Data-XPath that have data equality tests either don't have negation or cannot access arbitrarily deep nodes in the tree (in contrast, FO 2 (∼, +1) has both). They also don't have horizontal navigation and therefore miss an important aspect of XML navigation features. The paper extends the results of [Benedikt et al. 2005 ] by including the horizontal axis, but the only decidable fragment presented in that paper does not have negation. Finally, the focus of [Benedikt et al. 2005; was to have the precise complexity for the decision procedure of the fragment considered, while the precise complexity of FO 2 (∼, +1) is still an open issue. An inspection of the current proof of Theorem 3.1 gives an upper bound of 3NExpTime, and the best lower bound we currently have is NExpTime-hardness. Another logical approach was considered in [Jurdziński and Lazić 2007], which extends from data words to data trees the temporal logic approach of [Demri et al. 2005; Demri and Lazić 2006] . The main contribution of [Jurdziński and Lazić 2007] is an alternating automaton over data trees, which uses registers to inspect data values. The decidability results are for one-way alternating automata with one register, with a non primitive recursive lower bound on the complexity. Various fragments of XPath can be encoded into the alternating automata. In general, the set of properties that can be described in our approach, and the set of properties that can be described using the alternating automata of [Jurdziński and Lazić 2007] , are incomparable. For instance, "every data value appears twice" can be expressed in our logic but not by the alternating automata, while the converse separation is witnessed by "every data value appears at most once on each path". The logic considered in [Alon et al. 2003 ] in order to solve the type inference problem is incomparable to FO 2 (∼, +1). It uses patterns with variables for the data values together with equality and inequality constraints on the variables in order to extract the relevant pieces of data. It can use arbitrarily many variables in the patterns, something FO 2 (∼, +1) cannot do, but it can only inspect the tree up to a given constant depth. As we have already mentioned, restricting FO to its two-variable fragment is a classical idea when looking for decidability [Grädel and Otto 1999] . Over graphs or over any relational structures, FO is undecidable, while its two-variable fragment is decidable [Mortimer 1975 ]. This does not imply anything on the decidability of FO 2 (∼, +1), since the equivalence relation and the two tree successor relations cannot be axiomatized in FO 2 . A recent paper [Kieroński and Otto 2005] generalized the result of [Mortimer 1975] in the presence of one or two equivalence relations. Again this does not apply to our context as we also have two successor relations. However [Kieroński and Otto 2005] also showed that the two-variable fragment of FO with three equivalence relations, without any other structure, is undecidable. This implies that FO 2 (∼ 1 , ∼ 2 , ∼ 3 , +1) is undecidable. On the other hand, two equivalence relations plus the extension of the successor relation to +1, +2 is already undecidable on words (see long version of ). Therefore, manipulating more than two different attributes at the same time quickly leads to undecidability. Note that this does not imply much for XPath, as already in the presence of two equivalence relations the logic FO 2 (∼ 1 , ∼ 2 , +1) seems to be no longer included in
doi:10.1145/1516512.1516515 fatcat:gzayp5cstveqhfb4kc74xgbemy