Efficient Bayesian inference of phylogenetic trees from large scale, low-depth genome-wide single-cell data [article]

Fatemeh Dorri, Sohrab Salehi, Kevin Chern, Tyler Funnell, Marc Williams, Daniel Lai, Mirela Andronescu, Kieran R Campbell, Andrew McPherson, Samuel Aparicio, Andrew Roth, Sohrab P Shah (+1 others)
2020 bioRxiv   pre-print
A new generation of scalable single cell whole genome sequencing (scWGS) methods, allows unprecedented high resolution measurement of the evolutionary dynamics of cancer cells populations. Phylogenetic reconstruction is central to identifying sub-populations and distinguishing mutational processes. The ability to sequence tens of thousands of single genomes at high resolution per experiment is challenging the assumptions and scalability of existing phylogenetic tree building methods and calls
more » ... r tailored phylogenetic models and scalable inference algorithms. We propose a phylogenetic model and associated Bayesian inference procedure which exploits the specifics of scWGS data. A first highlight of our approach is a novel phylogenetic encoding of copy-number data providing an attractive statistical-computational trade-off by simplifying the site dependencies induced by rearrangements while still forming a sound foundation to phylogenetic inference. A second highlight is an innovative phylogenetic tree exploration move which makes the cost of MCMC iterations bounded by O(|C| + |L|), where |C| is the number of cells and |L| is the number of loci. In contrast, existing off-the-shelf likelihood-based methods incur iteration cost of O(|C| |L|). Moreover, the novel move considers an exponential number of neighbouring trees whereas off-the-shelf moves consider a polynomial size set of neighbours. The third highlight is a novel mutation calling method that incorporates the copy-number data and the underlying phylogenetic tree to overcome the missing data issue. This framework allows us to realistically consider routine Bayesian phylogenetic inference at the scale of scWGS data.
doi:10.1101/2020.05.06.058180 fatcat:4s6zvqzygrazjol35mobvg7gcu