XML query optimization in the presence of side effects

Giorgio Ghelli, Nicola Onose, Kristoffer Rose, Jerome Simeon
2008 Proceedings of the 2008 ACM SIGMOD international conference on Management of data - SIGMOD '08  
The emergence of database languages with side effects, notably for XML, raises significant challenges for database compilers and optimizers. In this paper, we extend an algebra for the W3C XML query language with operations that allow data to be immediately updated. We study the impact of that extension on logical optimization, join detection, and pipelining. The main result of this work is to show that, with proper care, a number of important optimizations based on nested relational algebras
more » ... main applicable in the presence of side effects. Our approach relies on an analysis of the conditions that must be checked in order for algebraic rewritings to hold. An implementation and experimental results demonstrate the effectiveness of the approach. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Copyright 200X ACM X-XXXXX-XX-X/XX/XX ...$5.00. preserving essential optimizations based on algebraic rewritings. Surprisingly, there has been very little work in that area in the past. One notable exception is [10] that uses a state monad [20] to support side effects in a nested-relational calculus. However, optimization at the algebraic, logical and physical level is not addressed. To the best of our knowledge, we provide the first treatment of side effects for a nestedrelational algebra. Due to space constraints, we limit our scope to updates applied during query evaluation, and leave procedural extensions (notably variable assignment) for future work. We start with a motivating example. Usecase. Consider a simple scenario inspired by the sample retail application [1] provided with BEA's AquaLogic DSP [4] . This scenario assumes two XML data sources located at an on-line retailer site, named 'customers.xml' and 'orders.xml', made accessible by two XQuery functions, get-Customers and getUnconfirmedOrders. A customer can place orders, which are put on hold until they are confirmed by that customer. The application also maintains, through updates, access timestamps for customers data: 1 declare updating function getCustomers() { 2 for $c1 in doc('customers.xml')//customer 3 return 4 ( do replace value of $c1/timestamp with gettime(), 5 $c1 ) 6 }; 7 declare function getUnconfirmedOrders($cid) { In the absence of the do replace update on line 4, standard query optimizers would identify the query as a typical case of outer join between customers and orders (after function inlinining and using query unnesting techniques). In Section 3, we will see how similar nested queries can be optimized. We now give an overview of our approach. Queries with side effects. We consider an extension of XQuery 1.0 [2] with simple update expressions. We adopt a semantics that imposes a strict left-to-right evaluation order and the immediate application of updates, in the style of The language semantics specifies the following evaluation order: (i) retrieve the list of customer elements; (ii) expand the list of "gold" customers to a list of tuples with each customer paired with all possible purchase-orders; (iii) filter away the tuples that violate the predicate; (iv) for each tuple first remove all membership element nodes in the document and then return the customer element (the deletion has no effect from the second tuple onward). In this query, the optimizer should be able to use a join plan, since the updates occur only after all previous clauses are evaluated.
doi:10.1145/1376616.1376653 dblp:conf/sigmod/GhelliORS08 fatcat:owylrblxnfa6ned2xn4qw5qgoa