Tracing Lineage in Multi-version Scientific Databases

Mingwu Zhang, Daisuke Kihara, Sunil Prabhakar
2007 2007 IEEE 7th International Symposium on BioInformatics and BioEngineering  
The critical need for better tracing of lineage in scientific databases is well known [6] . It is clear that performance is not an issue for most domain scientists -rather the functionality is more important. In this paper, we highlight the importance of maintaining multiple versions of data and tracing fine-grained lineage in support of these needs. We study alternatives for managing versions, and propose a model for the example application of protein annotations. We present query rewriting
more » ... orithms for SPJ and ASPJ queries that piggy-back lineage computation with query evaluation. Our models are implemented using PostgreSQL and tested using a large, real dataset from Uniprot. We establish the validity of the approach in enabling relevant queries and study the space and time overheads. While these overheads can be high in some cases, the real gain for scientists is the novel functionality that can allow them to ascertain reliability of derived data, and foster data-driven research. To the best of our knowledge, this is the first work that can handle these types of queries for lineage tracing.
doi:10.1109/bibe.2007.4375599 dblp:conf/bibe/ZhangKP07 fatcat:jflnzvi64vgzrogtr7kvywfme4