Probabilistic ancestry maps: a method to assess and visualize population substructures in genetics [article]

Héléna A Gaspar, Gerome Breen
2018 bioRxiv   pre-print
Principal component analysis (PCA) is a standard method to correct for population stratification in ancestry-specific genome-wide association studies (GWASs) and is used to cluster individuals by ancestry. Using the 1000 genomes project data, we examine how non-linear dimensionality reduction methods such as t-distributed stochastic neighbor embedding (t-SNE) or generative topographic mapping (GTM) can be used to provide improved ancestry maps by accounting for a higher percentage of explained
more » ... ariance in ancestry, and how they can help to estimate the number of principal components necessary to account for population stratification. GTM also generates posterior probabilities of class membership which can be used to assess the probability of an individual to belong to a given population - as opposed to t-SNE, GTM can be used for both clustering and classification. This paper is a first application of GTM for ancestry classification models. Our maps and software are available online.
doi:10.1101/362343 fatcat:j37qikbcgrdjjafgolwebxwxcm