Streamlining Data-Intensive Biology With Workflow Systems [article]

Taylor Reiter, Phillip T. Brooks, Luiz Irber, Shannon E.K. Joslin, Charles M. Reid, Camille Scott, C. Titus Brown, N. Tessa Pierce
2020 bioRxiv   pre-print
As the scale of biological data generation has increased, the bottleneck of research has shifted from data generation to analysis. Researchers commonly need to build computational workflows that include multiple analytic tools and require incremental development as experimental insights demand tool and parameter modifications. These workflows can produce hundreds to thousands of intermediate files and results that must be integrated for biological insight. The maturation of data-centric
more » ... systems that internally manage computational resources, software, and conditional execution of analysis steps are reshaping the landscape of biological data analysis, and empowering researchers to conduct reproducible analyses at scale. Adoption of these tools can facilitate and expedite robust data analysis, but knowledge of these techniques is still lacking. Here, we provide a series of practices and strategies for leveraging workflow systems with structured project, data, and resource management to streamline large-scale biological analysis.
doi:10.1101/2020.06.30.178673 fatcat:up6eozdxyjhlxmkllqa4deewfm