Optimizing High Performance Big Data Cancer Workflows

Ivan Jimenez-Ruiz, Ricardo Gonzalez-Mendez, Alexander Ropelewski
2017 Proceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability, Success and Impact - PEARC17  
Appropriate optimization of bioinformatics workflows is vital to improve the timely discovery of variants implicated in cancer genomics. Sequenced human brain tumor data was assembled to optimize tool implementations and run various components of RNA sequence (RNA-seq) workflows. The measurable information produced by these tools account for the success rate and overall efficiency of a standardized and simultaneous analysis. We used the National Center for Biotechnology Information) Sequence
more » ... d Archive (NCBI-SRA) database to retrieve two transcriptomic datasets containing over 104 million reads as input data. We used these datasets to benchmark various file systems on the Bridges supercomputer to improve overall workflow throughput. Based on program and job timings, we report critical recommendations on selections of appropriate file systems and node types to efficiently execute these workflows.
doi:10.1145/3093338.3093372 dblp:conf/xsede/Jimenez-RuizGR17 fatcat:lzpwkbqoafaexlj4o5x2rhcv34