MSOAR: A High-Throughput Ortholog Assignment System Based on Genome Rearrangement

Zheng Fu, Xin Chen, Vladimir Vacic, Peng Nan, Yang Zhong, Tao Jiang
2007 Journal of Computational Biology  
The assignment of orthologous genes between a pair of genomes is a fundamental and challenging problem in comparative genomics, since many computational methods for solving various biological problems critically rely on bona fide orthologs as input. While it is usually done using sequence similarity search, we recently proposed a new combinatorial approach that combines sequence similarity and genome rearrangement. This paper continues the development of the approach and unites genome
more » ... ent events and (post-speciation) duplication events in a single framework under the parsimony principle. In this framework, orthologous genes are assumed to correspond to each other in the most parsimonious evolutionary scenario involving both genome rearrangement and (post-speciation) gene duplication. Besides several original algorithmic contributions, the enhanced method allows for the detection of inparalogs. Following this approach, we have implemented a high-throughput system for ortholog assignment on a genome scale, called MSOAR, and applied it to human and mouse genomes. As the result will show, MSOAR is able to find 99 more true orthologs than the INPARANOID program did. In comparison to the iterated exemplar algorithm on simulated data, MSOAR performed favorably in terms of assignment accuracy. We also validated our predicted main ortholog pairs between human and mouse using public ortholog assignment datasets, synteny information, and gene function classification. These test results indiate that our approach is very promising for genome-wide ortholog assignment. Supplemental material and MSOAR program are available at
doi:10.1089/cmb.2007.0048 pmid:17990975 fatcat:q6pso2acsvfdxbjh2ymoodrg3i