Non‐homology‐based prediction of gene functions in maize ( Zea mays ssp. mays )

Xiuru Dai, Zheng Xu, Zhikai Liang, Xiaoyu Tu, Silin Zhong, James C. Schnable, Pinghua Li
2020 The Plant Genome  
Advances in genome sequencing and annotation have eased the difficulty of identifying new gene sequences. Predicting the functions of these newly identified genes remains challenging. Genes descended from a common ancestral sequence are likely to have common functions. As a result, homology is widely used for gene function prediction. This means functional annotation errors also propagate from one species to another. Several approaches based on machine learning classification algorithms were
more » ... luated for their ability to accurately predict gene function from non-homology gene features. Among the eight supervised classification algorithms evaluated, random-forest-based prediction consistently provided the most accurate gene function prediction. Non-homology-based functional annotation provides complementary strengths to homology-based annotation, with higher average performance in Biological Process GO terms, the domain where homology-based functional annotation performs the worst, and weaker performance in Molecular Function GO terms, the domain where the accuracy of homology-based functional annotation is highest. GO prediction models trained with homology-based annotations were able to successfully predict annotations from a manually curated "gold standard" GO annotation set. Non-homology-based functional annotation based on machine learning may ultimately prove useful both as a method to assign predicted functions to orphan genes which lack functionally characterized homologs, and to identify and correct functional annotation errors which were propagated through homology-based functional annotations.
doi:10.1002/tpg2.20015 pmid:33016608 fatcat:lldl4wonu5eh3fll27ebd4hhma