A Divide-and-Conquer Implementation of Three Sequence Alignment and Ancestor Inference

Feng Yue, Jijun Tang
2007 2007 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2007)  
In this paper, we present an algorithm to simultaneously align three biological sequences with affine gap model and infer their common ancestral sequence. Our algorithm can be further extended to perform tree alignment for more sequences, and eventually unify the two procedures of phylogenetic reconstruction and sequence alignment. The novelty of our algorithm is: it applies the divide-and-conquer strategy so that the memory usage is reduced from O(n 3 ) to O(n 2 ), while at the same time, it
more » ... the same time, it is based on dynamic programming and optimal alignment is guaranteed. Traditionally, three sequence alignment is limited by the huge demand of memory space and can only handle sequences less than two hundred characters long. With the new improved algorithm, we can produce the optimal alignment of sequences of several thousand characters long. We implemented our algorithm as a C program package MSAM. It has been extensively tested with BAliBASE, a real manually refined multiple sequence alignment database, as well as simulated datasets generated by Rose (Random Model of Sequence Evolution). We compared our results with those of other popular multiple sequence alignment tools, including the widely used programs such as ClustalW and T-Coffee. The experiment shows that MSAM produces not only better alignment, but also better ancestral sequence. The software can be downloaded for free at
doi:10.1109/bibm.2007.40 dblp:conf/bibm/DiMaioSPS07 fatcat:tizqnmop4jcsnidj6gh5j6r4k4