A statistical approach for tracking clonal dynamics in cancer using longitudinal next-generation sequencing data [article]

Dimitrios V Vavoulis, Anthony Cutts, Jenny C Taylor, Anna Schuh
2020 bioRxiv   pre-print
Tumours are composed of genotypically and phenotypically distinct cancer cell populations (clones), which are subject to a process of Darwinian evolution in response to changes in their local micro-environment, such as drug treatment. In a cancer patient, this process of continuous adaptation can be studied through next-generation sequencing of multiple tumour samples combined with appropriate bioinformatics and statistical methodologies. One family of statistical methods for clonal
more » ... clonal deconvolution seeks to identify groups of mutations and estimate the prevalence of each group in the tumour, while taking into account its purity and copy number profile. These methods have been used in the analysis of cross-sectional data, as well as for longitudinal data by discarding information on the timing of sample collection. Two key questions are how (in the case of longitudinal data) can we incorporate such information in our analyses and if there is any benefit in doing so. Regarding the first question, we incorporated information on the temporal spacing of longitudinally collected samples into standard non-parametric approaches for clonal deconvolution by modelling the time dependence of the prevalence of each clone as a Gaussian process. This permitted reconstruction of the temporal profile of the abundance of each clone continuously from several sparsely collected samples and without any strong prior assumptions on the functional form of this profile. Regarding the second question, we tested various model configurations on a range of whole genome, whole exome and targeted sequencing data from patients with chronic lymphocytic leukaemia, on liquid biopsy data from a patient with melanoma and on synthetic data. We demonstrate that incorporating temporal information in our analysis improves model performance, as long as data of sufficient volume and complexity are available for estimating free model parameters. We expect that our approach will be useful in cases where collecting a relatively long sequence of tumour samples is feasible, as in the case of liquid cancers (e.g. leukaemia) and liquid biopsies. The statistical methodology presented in this paper is freely available at github.com/dvav/clonosGP.
doi:10.1101/2020.01.20.913236 fatcat:fp2efeb7bvepbky4c4fl5nx5fq