A framework for scientific workflow reproducibility in the cloud

Rawaa Qasha, Jacek Cala, Paul Watson
2016 2016 IEEE 12th International Conference on e-Science (e-Science)  
Workflow is a well-established means by which to capture scientific methods in an abstract graph of interrelated processing tasks. The reproducibility of scientific workflows is therefore fundamental to reproducible e-Science. However, the ability to record all the required details so as to make a workflow fully reproducible is a long-standing problem that is very difficult to solve. In this paper, we introduce an approach that integrates system description, source control, container management
more » ... and automatic deployment techniques to facilitate workflow reproducibility. We have developed a framework that leverages this integration to support workflow execution, re-execution and reproducibility in the cloud and in a personal computing environment. We demonstrate the effectiveness of our approach by examining various aspects of repeatability and reproducibility on real scientific workflows. The framework allows workflow and task images to be captured automatically, which improves not only repeatability but also runtime performance. It also gives workflows portability across different cloud environments. Finally, the framework can also track changes in the development of tasks and workflows to protect them from unintentional failures.
doi:10.1109/escience.2016.7870888 dblp:conf/eScience/QashaCW16 fatcat:mtz4skna25eqlccwkrsthoqn2a