Analysis of single-strand exceptional word symmetry in the human genome: new measures

V. Afreixo, J. M. O. S. Rodrigues, C. A. C. Bastos
2014 Biostatistics  
Some previous studies suggest the extension of Chargaff 's second rule (the phenomenon of symmetry in a single DNA strand) to long DNA words. However, in random sequences generated under an independent symbol model where complementary nucleotides have equal occurrence probabilities, we expect the phenomenon of symmetry to hold for any word length. In this work, we develop new statistical methods to measure the exceptional symmetry. Exceptional symmetry is a refinement of Chargaff 's second
more » ... y rule that highlights the words whose frequency of occurrence is similar to that of its reversed complement but dissimilar to the frequencies of occurrence of other words which contain the same number of nucleotides A or T. We analyze words of lengths up to 12 in the complete human genome and in each chromosome separately. We assess exceptional symmetry globally, by word group, and by word. We conclude that the global symmetry present in the human genome is clearly exceptional and significant. The chromosomes present distinct exceptional symmetry profiles. There are several exceptional word groups and exceptional words with a strong exceptional symmetry.
doi:10.1093/biostatistics/kxu041 pmid:25190514 fatcat:sjivp4fxv5az3a2w6nowuzihdy