FILTERED DISTANCE MATRIX FOR CONSTRUCTING HIGH-THROUGHPUT MULTIPLE SEQUENCE ALIGNMENT ON PROTEIN DATA

Muhannad Abu-Hashem, Nur Aini, Abdul Rashid, Rosni Abdullah, Atheer Abdulrazzaq, Awsan Hasan
2016 Journal of Theoretical and Applied Information Technology   unpublished
Multiple sequence alignment (MSA) is a cornerstone process in computational biology and bioinformatics. Although numerous algorithms have been proposed for MSA, producing an efficient MSA with high accuracy remains a huge challenge. Progressive alignment method is broadly used for constructing MSA. It uses guide trees as an input to guide the alignment process. Pair-wise alignment plays a significant role in building the distance matrices where distance matrices are necessary for building the
more » ... ide trees. Robust distance matrix leads to better MSA. In this research, we present Filtered Distance Matrix for building MSA (FDM-MSA) to construct MSA. FDM-MSA is divided into four phases: constructing the distance matrix, building the filtering system, building the guide tree, and constructing the MSA. HashTable-N-Gram-Hirschberg (HT-NGH) is used to build the distance matrix. Two sequence detectors are involved in building the filtering system: multi-domain detector and outlier detector. After filtering the distance matrix, Neighbor Joining and progressive alignment methods are employed to construct the guide tree and MSA. The experiments show that the FDM-MSA algorithm shows improved performance in both terms; time and accuracy. FDM-MSA algorithm obtains the best time performance over all competitive methods in most datasets, as well as obtains the highest Sum-of-Pairs Score on RV2 dataset of BAlibase dataset and the second best Total Column score on average.
fatcat:qam42pojxfendfwfmfza74kgcy