From relation algebra to semi-join algebra

Jelle Hellings, Catherine L. Pilachowski, Dirk Van Gucht, Marc Gyssens, Yuqing Wu
2017 Proceedings of The 16th International Symposium on Database Programming Languages - DBPL '17  
Many graph query languages rely on the composition operator to navigate graphs and select nodes of interests, even though evaluating compositions of relations can be costly. Often, this need for composition can be reduced by rewriting towards queries that use semi-joins instead. In this way, the cost of evaluating queries can be signi cantly reduced. We study techniques to recognize and apply such rewritings. Concretely, we study the relationship between the expressive power of the relation
more » ... bras, that heavily rely on composition, and the semi-join algebras, that replace the composition operator in favor of the semi-join operators. As our main result, we show that each fragment of the relation algebras where intersection and/or di erence is only used on edges (and not on complex compositions) is expressively equivalent to a fragment of the semi-join algebras. This expressive equivalence holds for node queries that evaluate to sets of nodes. For practical relevance, we exhibit constructive steps for rewriting relation algebra queries to semi-join algebra queries, and prove that these steps lead to only a well-bounded increase in the number of steps needed to evaluate the rewritten queries. In addition, on node-labeled graphs that are sibling-ordered trees, we establish new relationships among the expressive power of Regular XPath, Conditional XPath, FO-logic, and the semi-join algebra augmented with restricted xpoint operators. CCS CONCEPTS • Theory of computation → Database query processing and optimization (theory); Logic and databases; Database query languages (principles); Finite Model Theory; ACM Reference format: Jelle Hellings, Catherine L. Pilachowski, Dirk Van Gucht, Marc Gyssens, and Yuqing Wu. 2017. From relation algebra to semi-join algebra: an approach for graph query optimization. Figure 2: Labeled binary relations representing the graph data in Figure 1. To query such graph data, many navigational query languages have been developed which, at their core, use a fragment of the relation algebra of Tarski [25], augmented with the Kleene-star
doi:10.1145/3122831.3122833 dblp:conf/dbpl/HellingsPGGW17 fatcat:4f2ytnvfprdc7le3e44kbp6njq