Reconstructing latent periods in genome sequences with insertions and deletions

Raman Arora, Colin Dewey, William A. Sethares
2009 2009 IEEE International Workshop on Genomic Signal Processing and Statistics  
Tandem and latent repeats in genome sequences provide insight into its various structural and functional roles. Such regions in genome sequences are modeled as cyclostationary processes, generated by a collection of information sources in a cyclic manner. The maximum likelihood (ML) estimates can be easily generated for the cyclostationary profiles and for the statistical period of such subsequences. However, in the presence of insertions and deletions, the ML estimators suffer greatly in their
more » ... ability to accurately identify the periods. This paper extends the cyclic model to a profile hidden Markov model (PHMM) to account for insertions and deletions. An iterative algorithm is developed to learn parameters of the PHMM and Viterbi algorithm is employed to learn the most likely path through the state space. This reconstructs likely insertions and deletions in the sequence and results in better estimates of the statistical period and cyclostationary profiles than the ML approach. Experimental results are provided with simulated sequences as well as with chromosome 1 sequence from human genome.
doi:10.1109/gensips.2009.5174377 dblp:conf/gensips/AroraDS09 fatcat:ngtsmnqarzdj7cstz6lf4dpdcm