A Bayesian nonparametric semi-supervised model for integration of multiple single-cell experiments [article]

Archit Verma, Barbara E Engelhardt
2020 bioRxiv   pre-print
Joint analysis of multiple single cell RNA-sequencing (scRNA-seq) data is confounded by technical batch effects across experiments, biological or environmental variability across cells, and different capture processes across sequencing platforms. Manifold alignment is a principled, effective tool for integrating multiple data sets and controlling for confounding factors. We demonstrate that the semi-supervised t-distributed Gaussian process latent variable model (sstGPLVM), which projects the
more » ... ta onto a mixture of fixed and latent dimensions, can learn a unified low-dimensional embedding for multiple single cell experiments with minimal assumptions. We show the efficacy of the model as compared with state-of-the-art methods for single cell data integration on simulated data, pancreas cells from four sequencing technologies, stem cells from male and female donors, and mouse brain cells from both spatial seqFISH+ and traditional scRNA-seq.
doi:10.1101/2020.01.14.906313 fatcat:htsstup2njd4xhf667snjogo4e