Querying graph databases with XPath

Leonid Libkin, Wim Martens, Domagoj Vrgoč
2013 Proceedings of the 16th International Conference on Database Theory - ICDT '13  
XPath plays a prominent role as an XML navigational language due to several factors, including its ability to express queries of interest, its close connection to yardstick database query languages (e.g., first-order logic), and the low complexity of query evaluation for many fragments. Another common database model -graph databases -also requires a heavy use of navigation in queries; yet it largely adopts a different approach to querying, relying on reachability patterns expressed with regular
more » ... constraints. Our goal here is to investigate the behavior and applicability of XPath-like languages for querying graph databases, concentrating on their expressiveness and complexity of query evaluation. We are particularly interested in a model of graph data that combines navigation through graphs with querying data held in the nodes, such as, for example, in a social network scenario. As navigational languages, we use analogs of core and regular XPath and augment them with various tests on data values. We relate these languages to first-order logic, its transitive closure extensions, and finitevariable fragments thereof, proving several capture results. In addition, we describe their relative expressive power. We then show that they behave very well computationally: they have a low-degree polynomial combined complexity, which becomes linear for several fragments. Furthermore, we introduce new types of tests for XPath languages that let them capture first-order logic with data comparisons and prove that the low complexity bounds continue to apply to such extended languages. Therefore, XPath-like languages seem to be very well-suited to query graphs. Summary of the results. We start by studying the expressive power. The first set of results concerns with pure navigational power (no data-value comparisons). It turns out that GXPath core captures precisely FO 3 , first-order logic with 3 variables, like its analog (core XPath 2.0) on trees. The difference, though, is that on graphs FO = FO 3 , but on trees the two are the same. The proof establishes connection with relation algebra [43] which was recently studied in connection with pure navigational querying of graph databases, but from a rather different angle (see [23, 24] which considered relative expressiveness of fragments of relation algebra based on sets of operators). Note that on trees there is another way of capturing FO, by means of conditional XPath [36] , which adds the untiloperator. We show that on graphs the analog of conditional XPath goes beyond FO. When we move to GXPath reg , we show that the positive fragment of it captures precisely the nested regular expressions [38], proposed as the navigational mechanism for SPARQL. This further confirms the usefulness of XPath for graph querying. Full GXPath reg is more expressive and corresponds to a fragment of the transitive closure logic. We also show that it is incompatible with other graph languages
doi:10.1145/2448496.2448513 dblp:conf/icdt/LibkinMV13 fatcat:i34u6ty3afh3na3yay6s3ab76a