Evaluation of probabilistic queries over imprecise data in constantly-evolving environments
Sensors are often employed to monitor continuously changing entities like locations of moving objects and temperature. The sensor readings are reported to a database system, and are subsequently used to answer queries. Due to continuous changes in these values and limited resources (e.g., network bandwidth and battery power), the database may not be able to keep track of the actual values of the entities. Queries that use these old values may produce incorrect answers. However, if the degree of
... uncertainty between the actual data value and the database value is limited, one can place more confidence in the answers to the queries. More generally, query answers can be augmented with probabilistic guarantees of the validity of the answers. In this paper, we study probabilistic query evaluation based on uncertain data. A classification of queries is made based upon the nature of the result set. For each class, we develop algorithms for computing probabilistic answers, and provide efficient indexing and numeric solutions. We address the important issue of measuring the quality of the answers to these queries, and provide algorithms for efficiently pulling data from relevant sensors or moving objects in order to improve the quality of the executing queries. Extensive experiments are performed to examine the effectiveness of several data update policies. A new section (Section 5) on efficient evaluation of probabilistic queries, where disk-based uncertainty indexing and numerical methods are examined; (2) new sets of experimental results (Section 7) in a more realistic simulation model; (3) a method based on time-series analysis for obtaining a probability density function in the uncertainty model (Appendix A); (4) enhancement of probability query evaluation algorithms to handle special cases of uncertainty in Appendix B; and (3) discussions on future work in Section 9, as well as more detailed examples.