Style transfer with variational autoencoders is a promising approach to RNA-Seq data harmonization and analysis [article]

Nikolai E. Russkikh, Denis V. Antonets, Dmitry N. Shtokalo, Alexander V. Makarov, Alexey M. Zakharov, Evgeny V. Terentev
2019 bioRxiv   pre-print
The transcriptomic data is being frequently used in the research of biomarker genes of different diseases and biological states. Generally, researchers have data from hundreds, rarely thousands of specimens at hand. In most cases, the proposed candidate biomarker genes and corresponding decision rules fail in prospective research studies, especially for diseases with complex polygenic background. The naive addition of training data usually also does not improve performance due to batch effects,
more » ... resulting from various discrepancies between different datasets. To get a better understanding of factors underlying the observed gene expression data variation, we applied a style transfer technique. The most of style transfer studies are focused on image data, and, to our knowledge, this is the first attempt to adapt this procedure to gene expression domain. As a style component, there might be used either technical factors of data variance, such as sequencing platform, RNA extraction protocols, or any biological details about the samples which we would like to control (gender, biological state, treatment etc.). The proposed solution is based on Variational Autoencoder artificial neural network. To disentangle the style components, we trained the encoder with discriminator in an adversarial manner. This approach can be useful for both data harmonization and data augmentation for obtaining semisynthetic samples when the real data is scarce. We demonstrated the applicability of our framework using single cell RNA-Seq data from Mouse Cell Atlas, where we were able to transfer the mammary gland biological state (virgin, pregnancy and involution state) between the samples with semantics (cell types) being preserved and with biologically relevant gene expression changes.
doi:10.1101/791962 fatcat:pqemdrhapzfo7hxefdchmvbp3m