Making High-level Queries on Diverse Genome Data: A Structured Genome Document Database System Based on GXML and GQL

Aaron Stokes, Hideo Matsuda, Akihiro Hashimoto
1999 Genome Informatics Series  
Complete DNA sequences (genomes) and associated data are being made available worldwide at an astonishing rate. Through computer analysis of such data, molecular biologists hope to gain an overall understanding of the genome, such as by predicting large-scale gene networks. However, this is difficult because diverse genome data are scattered across many highly heterogeneous databases, and because existing database systems lack the facilities to expose and analyze functional relationships among
more » ... he data. To address these problems, we propose a new type of genome database system. Since a genome can be thought of intuitively as a kind of 'document', our system uses a structured document language based on XML to effectively represent genomes and associated data. The information-rich structures of the genome documents help cope with data diversity and heterogeneity. A powerful query language is introduced that exposes important biological relationships among the genome data. We have obtained favorable results from several experiments, demonstrating the usefulness of our method in building a top-down view of genome functionality. •\177 Mycoplasma _genitalium Aquifex aeolicus
doi:10.11234/gi1990.10.176 fatcat:nw4h4jgv5fcspjdvp2epg7thqu