Multiple Alignment of Promoter Sequences from the Arabidopsis thaliana L. Genome

Eugene V. Korotkov, Yulia M. Suvorova, Dmitrii O. Kostenko, Maria A. Korotkova
2021 Genes  
In this study, we developed a new mathematical method for performing multiple alignment of highly divergent sequences (MAHDS), i.e., sequences that have on average more than 2.5 substitutions per position (x). We generated sets of artificial DNA sequences with x ranging from 0 to 4.4 and applied MAHDS as well as currently used multiple sequence alignment algorithms, including ClustalW, MAFFT, T-Coffee, Kalign, and Muscle to these sets. The results indicated that most of the existing methods
more » ... d produce statistically significant alignments only for the sets with x < 2.5, whereas MAHDS could operate on sequences with x = 4.4. We also used MAHDS to analyze a set of promoter sequences from the Arabidopsis thaliana genome and discovered many conserved regions upstream of the transcription initiation site (from −499 to +1 bp); a part of the downstream region (from +1 to +70 bp) also significantly contributed to the obtained alignments. The possibilities of applying the newly developed method for the identification of promoter sequences in any genome are discussed. A server for multiple alignment of nucleotide sequences has been created.
doi:10.3390/genes12020135 pmid:33494278 pmcid:PMC7909805 fatcat:v3sc6wlmv5blfikgjbpfz7nydy