PREDICTD: PaRallel Epigenomics Data Imputation With Cloud-based Tensor Decomposition [article]

Timothy J Durham, Maxwell W Libbrecht, J Jeffry Howbert, Jeffrey Bilmes, William S Noble
2017 bioRxiv   pre-print
The Encyclopedia of DNA Elements (ENCODE) and the Roadmap Epigenomics Project have produced thousands of data sets mapping the epigenome in hundreds of cell types. However, the number of cell types remains too great to comprehensively map given current time and financial constraints. We present a method, PaRallel Epigenomics Data Imputation with Cloud-based Tensor Decomposition (PREDICTD), to address this issue by computationally imputing missing experiments in collections of epigenomics
more » ... ents. PREDICTD leverages an intuitive and natural model called "tensor decomposition" to impute many experiments simultaneously. Compared with the current state-of-the-art method, ChromImpute, PREDICTD produces lower overall mean squared error, and combining methods yields further improvement. We show that PREDICTD data can be used to investigate enhancer biology at non-coding human accelerated regions. PREDICTD provides reference imputed data sets and open-source software for investigating new cell types, and demonstrates the utility of tensor decomposition and cloud computing, two technologies increasingly applicable in bioinformatics.
doi:10.1101/123927 fatcat:nht4c5d3kfa6xpcxo42xiuxgme