Incremental maintenance of aggregate and outerjoin expressions

Himanshu Gupta, Inderpal Singh Mumick
2006 Information Systems  
Views stored in a data warehouse need to be kept current. As recomputing the views is very expensive, incremental maintenance algorithms are required. Over recent years, several incremental maintenance algorithms have been proposed. None of the proposed algorithms handle the general case of relational expressions involving aggregate and outerjoin operators efficiently. In this article, we develop the change-table technique for incrementally maintaining general view expressions involving
more » ... al and aggregate operators. We show that the change-table technique outperforms the previously proposed techniques by orders of magnitude. The developed framework easily extends to efficiently maintaining view expressions containing outerjoin operators. We prove that the developed change-table technique is an optimal incremental maintenance scheme for a given view expression tree under some reasonable assumptions. as the last operator of the expression tree. Gupta et al. in [GJM97] show how to maintain a simple outerjoin view, but do not address general expressions involving outerjoin operators. 1 • To date, most of the incremental maintenance approaches compute and propagate insertions and deletions at each node in a view expression tree, which could be very inefficient in view expressions that involve aggregation or outerjoin operators. Our Contributions. In this article, we develop the change-table technique for incremental maintenance of general view expressions involving aggregate and outerjoin operators. Change table of a particular view is applied to the view using a special refresh operator. We develop techniques for computation and propagation of change tables through various operators in a given view expression, in response to changes at the base relations. In contrast to the previously developed techniques which propagate data in terms of insertions and deletions through a view expression, our developed change-table technique propagates data (in terms of change-tables) as well as action (in terms of parameters of the refresh operation) through the given view expression. We show that the developed change-table framework yields very efficient incremental maintenance expressions for general view expressions. Paper Organization. In the rest of this section, we present some basic notation used throughout this article. Section 2 presents a motivating example that illustrates the idea behind this paper and contrasts previous techniques with the change-table technique developed in this article. The example shows that the changetable technique outperforms the previously proposed techniques by orders of magnitude. In Section 3, we briefly describe how our work fits in the previous frameworks of incremental view maintenance algorithms. In Section 4, we define the refresh operator used to apply the changes represented as a change table and briefly outline its implementation. In the following section, we discuss propagation of change tables that originate at an aggregate operator. Section 6 discusses propagation of change tables that originate at an outerjoin node. We discuss the optimality of our techniques under some reasonable cost model in Section 7. A brief survey of related work is presented in Section 8. Finally, we present our concluding remarks in Section 9. Notations. We consider only bag semantics in this article, i.e., all the relational operators used are duplicatepreserving. We use to denote bag union, − · to denote monus (bag minus), E to denote deletions from a bag-algebra expression E, E to denote insertions into E, σ p to denote selection on condition p, Π A to denote duplicate-preserving projection on a set of attributes A, π to denote the generalized projection operator (note that we use slightly different symbols for duplicate-preserving projection (Π) and for generalized projection (π) operators), × to denote cross-product, 1 to denote natural join, and 1 J and f o J to denote join and full outerjoin operations with the join condition J. The symbols lo and ro are used for left and right outerjoin respectively. Also, Attrs(J) denotes the set of attributes used in a predicate J or a relation J. The only operators that may require explanation are the outerjoin and generalized projection operators. The (full) outerjoin differs from an ordinary join by including in the result any "dangling" 2 tuple of either relation after "padding" it with NULL's in those attributes that belong to the other relation. For example, R(A, B) f o R.B=S.B S(B, C) will include a tuple (a, b, NULL, NULL), if (a, b) ∈ R and (b, c) / ∈ S for any c. One variant of the outerjoin operator is a left (right) outerjoin, where the dangling tuples of only the left (right) operand relation are padded with NULL's and included in the result. Hence, in the above example, (a, b, NULL, NULL) would be included in R lo J S, but not in R ro J S. The generalized projection operator 1 In work done concurrently with ours, Griffin and Kumar [GK98] derived expressions for propagating insertions and deletions through outerjoin operators. 2 Dangling tuples are the ones that fail to join with any tuple from the other relation. I C1 I C2 J C1 NY I 210 2 NY J 480 2 NY I 1/1/95 10 NY I 1/1/96 22 ... NY I 1/1/95 60 NY I 2/4/98 60 NY J 2/2/99 50
doi:10.1016/ fatcat:sp4c5ymqcvak3aepbfdd32uube