Integrating Policy with Scientific Workflow Management for Data-Intensive Applications

Ann L. Chervenak, David E. Smith, Weiwei Chen, Ewa Deelman
2012 2012 SC Companion: High Performance Computing, Networking Storage and Analysis  
As scientific applications generate and consume data at ever-increasing rates, scientific workflow systems that manage the growing complexity of analyses and data movement will increase in importance. The goal of our work is to improve the overall performance of scientific workflows by using policy to improve data staging into and out of computational resources. We developed a Policy Service that gives advice to the workflow system about how to stage data, including advice on the order of data
more » ... the order of data transfers and on transfer parameters. The Policy Service gives this advice based on its knowledge of ongoing transfers, recent transfer performance, and the current allocation of resources for data staging. The paper describes the architecture of the Policy Service and its integration with the Pegasus Workflow Management System. It employs a range of policies for data staging, and presents performance results for one policy that does a greedy allocation of data transfer streams between source and destination sites. The results show performance improvements for a data-intensive workflow: the Montage astronomy workflow augmented to perform additional large data staging operations. Index Terms-data placement, scientific workflow, policy service, greedy allocation policy.
doi:10.1109/sc.companion.2012.29 dblp:conf/sc/ChervenakSCD12 fatcat:koxibtbh55d3rb4jij7qkg7nfy