New Method for Sequence Alignment Based on Probabilities of Nucleotide Correspondences

R. Dimitrov, D. Gouliamova
2012 Biotechnology & Biotechnological Equipment  
The objective of our work is to develop a general method for structurally related, but diverged sequences for simultaneous optimization of alignment and self-folding -the so-called Sankoff's program for simultaneous prediction of secondary structure and alignment between nucleotide sequences. A simple reason behind the simultaneous optimization of alignment and self-folding is that strong structural consensus among related, but diverged sequences are a good indicator for preserved functional
more » ... erved functional role. Up to now there is no a general solution for this long standing problem. Here we discuss an approach which is just a first step to the full realization of Sankoff's program. Currently available models and software packages, such as foldalign, dynalign and others, implement only restricted versions (variations around first align and then fold or oppositely) of Sunkoff's program and do not use the full loop-based RNA/DNA energy model. We divided Sankof's program in two steps based on the analogy between the classical alignment algorithm and hybridization without self-folding. The next step is to include in the alignment an algorithm for the self-folding. In our approach, the alignment problem requires the implementation of the full loop-based RNA/DNA energy model for hybridization of two sequences. For this, we divided the alignment between two sequences into loops and associated a score to each loop in such way that the total score of the alignment is a sum over the scores for each alignment loop. The loop scoring model for alignment consists of following loop types: stacking with matched and mismatched pairs, bulges, internal loops and dangling ends. Calculation of thermodynamic partition function over all possible double-stranded conformations is interpreted in terms of all possible canonical pairwise alignments. The partition function is computed by means of a dynamic programming algorithm and used to determine the probability of an alignment as well as the probability of each possible match between two sequence positions. For calculation of match probabilities detailed recursion relations for partition functions of alignments are based on their recursion analogs for hybridization of subsequences. The partition function is used for backtracking and reconstructing a properly weighted ensemble of optimal and suboptimal alignments.
doi:10.5504/50yrtimb.2011.0039 fatcat:sbdgeei2rfgtfk6d54tma2jeui