Accurate estimation of single cell allele-specific gene expression using all reads and combining information across cells [article]

Kwangbom Choi, Narayanan Raghupathy, Gary A Churchill
2018 bioRxiv   pre-print
Single-cell RNA sequencing (scRNA-Seq) can reveal features of cellular gene expression that cannot be observed in whole-tissue analysis. Allele-specific expression in single cells can provide an even richer picture of the stochastic and dynamic features of gene expression. Single-cell technologies are moving toward sequencing larger numbers of cells with low depth of coverage per cell. Low coverage results in increased sampling variability and frequent occurrence of zero counts for genes that
more » ... e expressed at low levels or that are dynamically expressed in short bursts. The problems associated with low coverage are exacerbated in allele-specific analysis by the almost universal practice of discarding reads that cannot be unambiguously aligned to one allele of one gene (multi-reads). We demonstrate that discarding multi-reads leads to higher variability in estimates of allelic proportions, an increased frequency of sampling zeros, and can lead to spurious findings of dynamic and monoallelic gene expression. We propose a weighted-allocation method of counting reads that substantially improves estimation of allelic proportions and reduces spurious zeros in the allele-specific read counts. We further demonstrate that combining information across cells using a hierarchical mixture model reduces sampling variability without sacrificing cell-to-cell heterogeneity. We applied our approach to track changes in the allele-specific expression patterns of cells sampled over a developmental time course. We implemented these methods in extensible open-source software scBASE, which is available at https://github.com/churchill-lab/scBASE.
doi:10.1101/383224 fatcat:5rzeqg37zvg7hnqarkvwavjqri