Project Dhaka: Variational Autoencoder for Unmasking Tumor Heterogeneity from Single Cell Genomic Data [article]

Sabrina Rashid, Sohrab Shah, Ziv Bar-Joseph, Ravi Pandya
2017 bioRxiv   pre-print
Intra-tumor heterogeneity is one of the key confounding factors in deciphering tumor evolution. Malignant cells will have variations in their gene expression, copy numbers, and mutation even when coming from a single tumor. Single cell sequencing of tumor cells is of paramount importance for unmasking the underlying the tumor heterogeneity. However extracting features from the single cell genomic data coherent with the underlying biology is computationally challenging, given the extremely noisy
more » ... and sparse nature of the data. Here we are proposing 'Dhaka' a variational autoencoder based single cell analysis tool to transform genomic data to a latent encoded feature space that is more efficient in differentiating between the hidden tumor subpopulations. This technique is generalized across different types of genomic data such as copy number variation from DNA sequencing and gene expression data from RNA sequencing. We have tested the method on two gene expression datasets having 4K to 6K tumor cells and two copy number variation datasets having 250 to 260 tumor cells. Analysis of the encoded feature space revealed subpopulations of cells bearing distinct genomic signatures and the evolutionary relationship between them, which other existing feature transformation methods like t-SNE and PCA fail to do.
doi:10.1101/183863 fatcat:xejjnwz2bfeljdjjeaiad2wpdi