Incorporating Uncertainty Metrics into a General-Purpose Data Integration System

Brenton Louie, Landon Detwiler, Nilesh Dalvi, Ron Shaker, Peter Tarczy-Hornoch, Dan Suciu
2007 International Conference on Scientific and Statistical Database Management  
There is a significant need for data integration capabilities in the scientific domain, which has manifested itself as products in the commercial world as well as academia. However, in our experiences in dealing with biological data it has become apparent to us that existing data integration products do not handle uncertainties in the data very well. This leads to systems that often produce an explosion of less relevant answers which subsequently leads to a loss of more relevant answers by
more » ... ant answers by overloading the user. How to incorporate functionality into data integration systems to properly handle uncertainties and make results more useful has become an important research question. In this paper we describe an enhanced generalpurpose data integration system which incorporates uncertainty metrics within a formal probabilistic framework. Additionally, for evaluation purposes, we have implemented a use case scenario which utilizes biological data sources and performed a study which provides validation of system query results. Ps: 1.0 Pr: 0.7 UII: 0.05 Ps: 1.0 Pr: 0.8 UII: 0.35 Ps: 1.0 Pr: 0.8 UII: 0.26
doi:10.1109/ssdbm.2007.36 dblp:conf/ssdbm/LouieDDSTS07 fatcat:aqwmk2azo5hpbp5bq3wi6kfdie