Dissecting genomic determinants of positive selection with an evolution-guided regression model [article]

Yi-Fei Huang
2020 bioRxiv   pre-print
In evolutionary genomics, it is fundamentally important to understand how characteristics of genomic sequences, such as the expression level of a gene, determine the rate of adaptive evolution. While numerous statistical methods, such as the McDonald-Kreitman test, are available to examine the association between genomic features and positive selection, we currently lack a statistical approach to disentangle the direct effects of genomic features from the indirect effects mediated by
more » ... factors. To address this problem, we present a novel statistical model, the MK regression, which augments the McDonald-Kreitman test with a generalized linear model. Analogous to the classic multiple regression model, the MK regression can analyze multiple genomic features simultaneously to distinguish between direct and indirect effects on positive selection. Using the MK regression, we identify numerous genomic features responsible for positive selection in chimpanzees, including local mutation rate, residue exposure level, gene expression level, tissue specificity, and metabolic genes. In particular, we show that highly expressed genes have a higher rate of adaptation than their weakly expressed counterparts, even though a higher expression level may impose stronger negative selection on protein sequences. Also, we observe that metabolic genes tend to have a higher rate of adaptation than their non-metabolic counterparts, possibly due to recent changes in diet in primate evolution. Overall, the MK regression is a powerful approach to elucidate the genomic basis of adaptation.
doi:10.1101/2020.11.24.396762 fatcat:uzqzd6ho7rgfxk5i4d7ygzunbe