Integrating and Ranking Uncertain Scientific Data

Landon Detwiler, Wolfgang Gatterbauer, Brent Louie, Dan Suciu, Peter Tarczy-Hornoch
2009 Proceedings / International Conference on Data Engineering  
The BioRank project investigates formalisms for modeling uncertainties of scientific data in a mediator-based data integration system. Our motivating application is predicting previously not known functions of proteins. Much can be learned in biology by integrating known information with biological similarity functions and confidence values of experimental results. In this paper, we evaluate the role that probabilities can play in such a scenario with the help of real-world data. In particular,
more » ... we show that: (i) explicit modeling of uncertainties increases the quality of our functional predictions for less-and unknown, not however for well-known protein functions. This suggests exploratory search and new knowledge discovery as ideal application domains for probabilistic data integration; (ii) slight perturbations of the input probabilities do not severely affect the quality of our predictions. This suggests that probabilistic information integration is actually robust against slight variations in the way uncertainties are transformed into probabilities by domain experts; and (iii) our probabilistic scoring functions can be evaluated efficiently with the help of several techniques. This suggests that probabilistic query evaluation is not as hard for real-world problems as theory indicates.
doi:10.1109/icde.2009.209 dblp:conf/icde/DetwilerGLST09 fatcat:lvx3uur4dbc3lnlichykzs2iva