A Framework for the Design and Reuse of Grid Workflows [chapter]

Ilkay Altintas, Adam Birnbaum, Kim K. Baldridge, Wibke Sudholt, Mark Miller, Celine Amoreira, Yohann Potier, Bertram Ludaescher
2005 Lecture Notes in Computer Science  
Grid workflows can be seen as special scientific workflows involving high performance and/or high throughput computational tasks. Much work in grid workflows has focused on improving application performance through schedulers that optimize the use of computational resources and bandwidth. As high-end computing resources are becoming more of a commodity that is available to new scientific communities, there is an increasing need to also improve the design and reusability "performance" of
more » ... ic workflow systems. To this end, we are developing a framework that supports the design and reuse of grid workflows. Individual workflow components (e.g., for data movement, database querying, job scheduling, remote execution etc.) are abstracted into a set of generic, reusable tasks. Instantiations of these common tasks can be functionally equivalent atomic components (called actors) or composite components (so-called composite actors or subworkflows). In this way, a grid workflow designer does not have to commit to a particular Grid technology when developing a scientific workflow; instead different technologies (e.g. GridFTP, SRB, and scp) can be used interchangeably and in concert. We illustrate the application of our framework using two real-world Grid workflows from different scientific domains, i.e., cheminformatics and bioinformatics, respectively. With the increase in the volume of scientific data and knowledge, the demand to utilize the largest portion thereof in an efficient and simple way has become one of the main challenges in today's science. Many scientific domains need computing methods and resources for continued improvement of the quality of their research. Important examples include computational problems in bio-and cheminformatics. Technical challenges also arise through the introduction of different, heterogeneous distributed network computing systems that make up the Grid [1, 2] . While an increasing number of computational tools for the Grid become available, they are generally difficult to
doi:10.1007/11423287_11 fatcat:fgxytu53wbbprgkhnff6mp3ep4