Bayesian multi-population haplotype inference via a hierarchical dirichlet process mixture

Eric P. Xing, Kyung-Ah Sohn, Michael I. Jordan, Yee-Whye Teh
2006 Proceedings of the 23rd international conference on Machine learning - ICML '06  
Uncovering the haplotypes of single nucleotide polymorphisms and their population demography is essential for many biological and medical applications. Methods for haplotype inference developed thus far-including methods based on coalescence, finite and infinite mixtures, and maximal parsimonyignore the underlying population structure in the genotype data. As noted by Pritchard (2001) , different populations can share certain portion of their genetic ancestors, as well as have their own genetic
more » ... components through migration and diversification. In this paper, we address the problem of multipopulation haplotype inference. We capture cross-population structure using a nonparametric Bayesian prior known as the hierarchical Dirichlet process (HDP) (Teh et al., 2006) , conjoining this prior with a recently developed Bayesian methodology for haplotype phasing known as DP-Haplotyper (Xing et al., 2004). We also develop an efficient sampling algorithm for the HDP based on a two-level nested Pólya urn scheme. We show that our model outperforms extant algorithms on both simulated and real biological data.
doi:10.1145/1143844.1143976 dblp:conf/icml/XingSJT06 fatcat:srwrbyxjm5e6xbgkfkhe2jdhiu