Physicochemical amino acid properties better describe substitution rates in large populations
Molecular biology and evolution
Substitutions between chemically distant amino acids are known to occur less frequently than those between more similar amino acids. This knowledge, however, is not reflected in most codon substitution models, which treat all nonsynonymous changes as if they were equivalent in terms of impact on the protein. A variety of methods for integrating chemical distances into models have been proposed, with a common approach being to divide substitutions into radical or conservative categories.
... categories. Nevertheless, it remains unclear whether the resulting models describe sequence evolution better than their simpler counterparts. We propose a parametric codon model that distinguishes between radical and conservative substitutions, allowing us to assess if radical substitutions are preferentially removed by selection. Applying our new model to a range of phylogenomic data, we find differentiating between radical and conservative substitutions provides significantly better fit for large populations, but see no equivalent improvement for smaller populations. Comparing codon and amino acid models using these same data shows that alignments from large populations tend to select phylogenetic models containing information about amino acid exchangeabilities, whereas the structure of the genetic code is more important for smaller populations. Our results suggest selection against radical substitutions is, on average, more pronounced in large populations than smaller ones. The reduced observable effect of selection in smaller populations may be due to stronger genetic drift making it more challenging to detect preferences. Our results imply an important connection between the life history of a phylogenetic group and the model that best describes its evolution.