GeneCards: a novel functional genomics compendium with automated data mining and query reformulation support

M. Rebhan, V. Chalifa-Caspi, J. Prilusky, D. Lancet
1998 Bioinformatics  
Motivation: Modern biology is shifting from the 'one gene one postdoc' approach to genomic analyses that include the simultaneous monitoring of thousands of genes. The importance of efficient access to concise and integrated biomedical information to support data analysis and decision making is therefore increasing rapidly, in both academic and industrial research. However, knowledge discovery in the widely scattered resources relevant for biomedical research is often a cumbersome and
more » ... l task, one that requires a significant amount of training and effort. Results: To develop a model for a new type of topic-specific overview resource that provides efficient access to distributed information, we designed a database called 'GeneCards'. It is a freely accessible Web resource that offers one hypertext 'card' for each of the more than 7000 human genes that currently have an approved gene symbol published by the HUGO/GDB nomenclature committee. The presented information aims at giving immediate insight into current knowledge about the respective gene, including a focus on its functions in health and disease. It is compiled by Perl scripts that automatically extract relevant information from several databases, including SWISS-PROT, OMIM, Genatlas and GDB. Analyses of the interactions of users with the Web interface of GeneCards triggered development of easy-to-scan displays optimized for human browsing. Also, we developed algorithms that offer 'ready-to-click' query reformulation support, to facilitate information retrieval and exploration. Many of the long-term users turn to GeneCards to quickly access information about the function of very large sets of genes, for example in the realm of large-scale expression studies using 'DNA chip' technology or two-dimensional protein electrophoresis.
doi:10.1093/bioinformatics/14.8.656 pmid:9789091 fatcat:w4rmc4ew6rg3jnn4hcvuih6xxe