Latent representation of the human pan-celltype epigenome through a deep recurrent neural network [article]

Kevin Bradley Dsouza, Adam Yifan Li, Vijay K Bhargava, Maxwell W Libbrecht
2021 bioRxiv   pre-print
The availability of thousands of assays of epigenetic activity necessitates compressed representations of these data sets that summarize the epigenetic landscape of the genome. Until recently, most such representations were celltype specific, applying to a single tissue or cell state. Recently, neural networks have made it possible to summarize data across tissues to produce a pan-celltype representation. In this work, we propose Epi-LSTM, a deep long short-term memory (LSTM) recurrent neural
more » ... twork autoencoder to capture the long-term dependencies in the epigenomic data. The latent representations from Epi-LSTM capture a variety of genomic phenomena, including gene-expression, promoter-enhancer interactions, replication timing, frequently interacting regions and evolutionary conservation. These representations outperform existing methods in a majority of cell-types, while yielding smoother representations along the genomic axis due to their sequential nature.
doi:10.1101/2021.03.08.434446 fatcat:vpyn476bnvdgvc7lna32ayjjl4