Gene recognition in eukaryotic DNA by comparison of genomic sequences

P. S. Novichkov, M. S. Gelfand, A. A. Mironov
2001 Bioinformatics  
Motivation: Sequencing of complete eukaryotic genomes and large syntenic fragments of genomes makes it possible to apply genomic comparison for gene recognition. Results: This paper describes a spliced alignment algorithm that aligns candidate exon chains of two homologous genomic sequence fragments from different species. The algorithm is implemented in Pro-Gen software. Unlike other algorithms, Pro-Gen does not assume conservation of the exon-intron structure. Amino acid sequences obtained by
more » ... the formal translation of candidate exons are aligned instead of nucleotide sequences, which allows for distant comparisons. The algorithm was tested on a sample of human-mammal (mouse), human-vertebrate (Xenopus) and human-invertebrate (Drosophila) gene pairs. Surprisingly, the best results, 97-98% correlation between the actual and predicted genes, were obtained for more distant comparisons, whereas the correlation on the human-mouse sample was only 93%. The latter value increases to 95% if conservation of the exon-intron structure is assumed. This is caused by a large amount of sequence conservation in non-coding regions of the human and mouse genes probably due to regulatory elements.
doi:10.1093/bioinformatics/17.11.1011 pmid:11724729 fatcat:kvrap5zd2jd4bd7p3ywhcdqn7y