Making Aggregation Work in Uncertain and Probabilistic Databases

Raghotham Murthy, Robert Ikeda, Jennifer Widom
2011 IEEE Transactions on Knowledge and Data Engineering  
We describe how aggregation is handled in the Trio system for uncertain and probabilistic data. Because "exact" aggregation in uncertain databases can produce exponentially-sized results, we provide three alternatives: a low bound on the aggregate value, a high bound on the value, and the expected value. These variants return a single result instead of a set of possible results, and they are generally efficient to compute for both full-table and grouped aggregation queries. We provide formal
more » ... initions and semantics and a description of our open-source implementation for single-table aggregation queries. We study the performance and scalability of our algorithms through experiments over a large synthetic data set. We also provide some preliminary results on aggregations over joins. Note to referees: Our initial results were published as a workshop paper [17] . This paper includes the following additional material: (1) More algorithms for different aggregate variants (Section IV). (2) Proofs of the key lemmas and theorem for approximating expected-average (Section V and Appendix). (3) Performance experiments (Section VI) (4) Preliminary results on aggregation over joins (Section VII)
doi:10.1109/tkde.2010.166 fatcat:yiuoz4e3g5hdnmb2d2axic4sim