Maximum likelihood estimation of natural selection and allele age from time series data of allele frequencies [article]

Zhangyi He, Xiaoyang Dai, Mark Ashton Beaumont, Feng Yu
2019 bioRxiv   pre-print
Thanks to advances in ancient DNA preparation and sequencing techniques, time serial samples of segregating alleles are becoming more widely available in ancestral populations. Such time series data allow for more accurate inference of population genetic parameters and hypothesis testing on the recent action of natural selection. Here we develop a likelihood-based method for co-estimating the selection coefficient and the allele age from allele frequency time series data. Our method is built on
more » ... the hidden Markov model incorporating the Wright-Fisher diffusion conditioned to survive until the time of the most recent sample, which circumvents the assumption required in existing methods that the allele is created by mutation at a certain small frequency. We calculate the likelihood by numerically solving the Kolmogorov backward equation resulting from the conditioned Wright-Fisher diffusion backwards in time and re-weighting the solution by the emission probabilities of the observation at each sampling time point, which allows for a reduction of the two-dimensional numerical search for the maximum of the likelihood surface for the selection coefficient and the allele age. We show through extensive simulations that our approach can produce unbiased estimates of the selection coefficient and the allele age, even if the samples are sparsely distributed in time with small uneven sizes. We illustrate the utility of our method on real data by re-analysing the ancient DNA data associated with horse coat colouration and show that grouping samples can significantly bias the results of inference.
doi:10.1101/837310 fatcat:sdr2j3zj5jcqppqaxb6xwbmoua