Association rule mining of gene ontology annotation terms for SGD

Anurag Nagar, Michael Hahsler, Hisham Al-Mubaid
2015 2015 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)  
Gene Ontology is one of the largest bioinformatics project that seeks to consolidate knowledge about genes through annotation of terms to three ontologies. In this work, we present a technique to find association relationships in the annotation terms for the Saccharomyces cerevisiae (SGD) genome. We first present a normalization algorithm to ensure that the annotation terms have a similar level of specificity. Association rule mining algorithms are used to find significant and non-trivial
more » ... ation rules in these normalized datasets. Metrics such as support, confidence, and lift can be used to evaluate the strength of found rules. We conducted experiments on the entire SGD annotation dataset and here we present the top 10 strongest rules for each of the three ontologies. We verify the found rules using evidence from the biomedical literature. The presented method has a number of advantages -it relies only on the structure of the gene ontology, has minimal memory and storage requirements, and can be easily scaled for large genomes, such as the human genome. There are many applications of this technique, such as predicting the GO annotations for new genes or those that have not been studied extensively.
doi:10.1109/cibcb.2015.7300289 dblp:conf/cibcb/NagarHA15 fatcat:3qdgp2nzyjebffgxymonp7amou