120 Hits in 1.1 sec

Pipeline-Centric Provenance Model [article]

Paul Groth, Ewa Deelman, Gideon Juve, Gaurang Mehta, Bruce Berriman
2010 arXiv   pre-print
In this paper we propose a new provenance model which is tailored to a class of workflow-based applications. We motivate the approach with use cases from the astronomy community. We generalize the class of applications the approach is relevant to and propose a pipeline-centric provenance model. Finally, we evaluate the benefits in terms of storage needed by the approach when applied to an astronomy application.
arXiv:1005.4457v1 fatcat:6xz4ywkt6rgwnj64zmrbuduo6i

Scientific Workflows in the Cloud [chapter]

Gideon Juve, Ewa Deelman
2011 Grids, Clouds and Virtualization  
The development of cloud computing has generated significant interest in the scientific computing community. In this chapter we consider the impact of cloud computing on scientific workflow applications. We examine the benefits and drawbacks of cloud computing for workflows, and argue that the primary benefit of cloud computing is not the economic model it promotes, but rather the technologies it employs and how they enable new features for workflow applications. We describe how clouds can be
more » ... nfigured to execute workflow tasks, and present a case study that examines the performance and cost of three typical workflow applications on Amazon EC2. Finally, we identify several areas in which existing clouds can be improved and discuss the future of workflows in the cloud.
doi:10.1007/978-0-85729-049-6_4 fatcat:423wdzax3jejtljpzjhcwhi27i

Creating A Galactic Plane Atlas With Amazon Web Services [article]

G. Bruce Berriman, Ewa Deelman, John Good, Gideon Juve, Jamie Kinney, Ann Merrihew, Mats Rynge
2013 arXiv   pre-print
Tools such as Wrangler (, can easily automate the provisioning and configuration of clusters running on Amazon EC2.  ... 
arXiv:1312.6723v1 fatcat:vb4eikojzzdypivztoi6bnwnvy

Data Sharing Options for Scientific Workflows on Amazon EC2 [article]

Gideon Juve, Ewa Deelman, Karan Vahi, Gaurang Mehta, Bruce Berriman, Benjamin P. Berman, Phil Maechling
2010 arXiv   pre-print
Efficient data management is a key component in achieving good performance for scientific workflows in distributed environments. Workflow applications typically communicate data between tasks using files. When tasks are distributed, these files are either transferred from one computational node to another, or accessed through a shared storage system. In grids and clusters, workflow data is often stored on network and parallel file systems. In this paper we investigate some of the ways in which
more » ... ata can be managed for workflows in the cloud. We ran experiments using three typical workflow applications on Amazon's EC2. We discuss the various storage and file systems we used, describe the issues and problems we encountered deploying them on EC2, and analyze the resulting performance and cost of the workflows.
arXiv:1010.4822v1 fatcat:6oaurxdifrd7zgsaotxay6jy7m

Characterizing and profiling scientific workflows

Gideon Juve, Ann Chervenak, Ewa Deelman, Shishir Bharathi, Gaurang Mehta, Karan Vahi
2013 Future generations computer systems  
Researchers working on the planning, scheduling, and execution of scientific workflows need access to a wide variety of scientific workflows to evaluate the performance of their implementations. This paper provide a characterization of workflows from six diverse scientific applications, including astronomy, bioinformatics, earthquake science, and gravitational-wave physics. The characterization is based on novel workflow profiling tools that provide detailed information about the various
more » ... tional tasks that are present in the workflow. This information includes I/O, memory and computational characteristics. Although the workflows are diverse, there is evidence that each workflow has a job type that consumes the most amount of runtime. The study also uncovered inefficiency in a workflow component implementation, where the component was re-reading the same data multiple times.
doi:10.1016/j.future.2012.08.015 fatcat:ljsx7tntkjgg5ewbfkofh2szlq

Resource Provisioning Options for Large-Scale Scientific Workflows

Gideon Juve, Ewa Deelman
2008 2008 IEEE Fourth International Conference on eScience  
Scientists in many fields are developing largescale workflows containing millions of tasks and requiring thousands of hours of aggregate computation time. Acquiring the computational resources to execute these workflows poses many challenges for application developers. Although the grid provides ready access to large pools of computational resources, the traditional approach to accessing these resources suffers from many overheads that lead to poor performance. In this paper we examine several
more » ... echniques based on resource provisioning that may be used to reduce these overheads. These techniques include: advance reservations, multi-level scheduling, and infrastructure as a service (IaaS). We explain the advantages and disadvantages of these techniques in terms of cost, performance and usability.
doi:10.1109/escience.2008.160 dblp:conf/eScience/JuveD08 fatcat:2j6xbpn6evhbjnz7zuq6wld7gm

Automating Application Deployment in Infrastructure Clouds

Gideon Juve, Ewa Deelman
2011 2011 IEEE Third International Conference on Cloud Computing Technology and Science  
Cloud computing systems are becoming an important platform for distributed applications in science and engineering. Infrastructure as a Service (IaaS) clouds provide the capability to provision virtual machines (VMs) on demand with a specific configuration of hardware resources, but they do not provide functionality for managing resources once they are provisioned. In order for such clouds to be used effectively, tools need to be developed that can help users to deploy their applications in the
more » ... cloud. In this paper we describe a system we have developed to provision, configure, and manage virtual machine deployments in the cloud. We also describe our experiences using the system to provision resources for scientific workflow applications, and identify areas for further research.
doi:10.1109/cloudcom.2011.102 dblp:conf/cloudcom/JuveD11 fatcat:p6ex6duwwbfydiqjchjs6z25ui

Experiences with Resource Provisioning for Scientific Workflows Using Corral

Gideon Juve, Ewa Deelman, Karan Vahi, Gaurang Mehta
2010 Scientific Programming  
Fig. 7 . 7 Small Montage workflow with 67 tasks. 86 G. 86 Juve et al. / Experiences with resource provisioning for scientific workflows using Corral Fig. 8.  ... 
doi:10.1155/2010/208568 fatcat:w3uimqxy2ngkhihjijsdtcvode

Online Task Resource Consumption Prediction for Scientific Workflows

Rafael Ferreira da Silva, Gideon Juve, Mats Rynge, Ewa Deelman, Miron Livny
2015 Parallel Processing Letters  
Estimates of task runtime, disk space usage, and memory consumption, are commonly used by scheduling and resource provisioning algorithms to support efficient and reliable workflow executions. Such algorithms often assume that accurate estimates are available, but such estimates are difficult to generate in practice. In this work, we first profile five real scientific workflows, collecting fine-grained information such as process I/O, runtime, memory usage, and CPU utilization. We then propose
more » ... method to automatically characterize workflow task requirements based on these profiles. Our method estimates task runtime, disk space, and peak memory consumption based on the size of the tasks' input data. It looks for correlations between the parameters of a dataset, and if no correlation is found, the dataset is divided into smaller subsets using a clustering technique. Task estimates are generated based on the ratio parameter/input data size if they are correlated, or based on the probability distribution function of the parameter. We then propose an online estimation process based on the MAPE-K loop, where task executions are monitored and estimates are updated as more information becomes available. Experimental results show that our online estimation process results in much more accurate predictions than an offline approach, where all task requirements are estimated prior to workflow execution.
doi:10.1142/s0129626415410030 fatcat:vff62a3irfcedojo4mdiz55cha

The Application of Cloud Computing to Astronomy: A Study of Cost and Performance [article]

G. Bruce Berriman, Ewa Deelman, Gideon Juve, Moira Regelson, Peter Plavchan
2010 arXiv   pre-print
Cloud computing is a powerful new technology that is widely used in the business world. Recently, we have been investigating the benefits it offers to scientific computing. We have used three workflow applications to compare the performance of processing data on the Amazon EC2 cloud with the performance on the Abe high-performance cluster at the National Center for Supercomputing Applications (NCSA). We show that the Amazon EC2 cloud offers better performance and value for processor- and
more » ... limited applications than for I/O-bound applications. We provide an example of how the cloud is well suited to the generation of a science product: an atlas of periodograms for the 210,000 light curves released by the NASA Kepler Mission. This atlas will support the identification of periodic signals, including those due to transiting exoplanets, in the Kepler data sets.
arXiv:1010.4813v1 fatcat:6j2l7c2d7zcg5jodnyed52sqla

A Tale Of 160 Scientists, Three Applications, A Workshop and A Cloud [article]

G. Bruce Berriman, Carolyn Brinkworth, Dawn Gelino, Dennis K. Wittman, Ewa Deelman, Gideon Juve, Mats Rynge, Jamie Kinney
2012 arXiv   pre-print
The NASA Exoplanet Science Institute (NExScI) hosts the annual Sagan Workshops, thematic meetings aimed at introducing researchers to the latest tools and methodologies in exoplanet research. The theme of the Summer 2012 workshop, held from July 23 to July 27 at Caltech, was to explore the use of exoplanet light curves to study planetary system architectures and atmospheres. A major part of the workshop was to use hands-on sessions to instruct attendees in the use of three open source tools for
more » ... the analysis of light curves, especially from the Kepler mission. Each hands-on session involved the 160 attendees using their laptops to follow step-by-step tutorials given by experts. We describe how we used the Amazon Elastic Cloud 2 to run these applications.
arXiv:1211.4055v1 fatcat:afcefibd3nfuvaea3c5zqixcqm

Storage-aware Algorithms for Scheduling of Workflow Ensembles in Clouds

Piotr Bryk, Maciej Malawski, Gideon Juve, Ewa Deelman
2015 Journal of Grid Computing  
Juve et al. [33] evaluate data sharing options on IaaS clouds.  ... 
doi:10.1007/s10723-015-9355-6 fatcat:hqnibpr7kfafjdddrumhzvhu3m

Rethinking data management for big data scientific workflows

Karan Vahi, Mats Rynge, Gideon Juve, Rajiv Mayani, Ewa Deelman
2013 2013 IEEE International Conference on Big Data  
Scientific workflows consist of tasks that operate on input data to generate new data products that are used by subsequent tasks. Workflow management systems typically stage data to computational sites before invoking the necessary computations. In some cases data may be accessed using remote I/O. There are limitations with these approaches, however. First, the storage at a computational site may be limited and not able to accommodate the necessary input and intermediate data. Second, even if
more » ... ere is enough storage, it is sometimes managed by a filesystem with limited scalability. In recent years, object stores have been shown to provide a scalable way to store and access large datasets, however, they provide a limited set of operations (retrieve, store and delete) that do not always match the requirements of the workflow tasks. In this paper, we show how scientific workflows can take advantage of the capabilities of object stores without requiring users to modify their workflowbased applications or scientific codes. We present two general approaches, one that exclusively uses object stores to store all the files accessed and generated by a workflow, while the other relies on the shared filesystem for caching intermediate data sets. We have implemented both of these approaches in the Pegasus Workflow Management System and have used them to execute workflows in variety of execution environments ranging from traditional supercomputing environments that have a shared filesystem to dynamic environments like Amazon AWS and the Open Science Grid that only offer remote object stores. As a result, Pegasus users can easily migrate their applications from a shared filesystem deployment to one using object stores without changing their application codes.
doi:10.1109/bigdata.2013.6691724 dblp:conf/bigdataconf/VahiRJMD13 fatcat:3varrrwktven7kgonqlzqp4eiq

Energy-Constrained Provisioning for Scientific Workflow Ensembles

Ilia Pietri, Maciej Malawski, Gideon Juve, Ewa Deelman, Jarek Nabrzyski, Rizos Sakellariou
2013 2013 International Conference on Cloud and Green Computing  
Large computational problems may often be modelled using multiple scientific workflows with similar structure. These workflows can be grouped into ensembles, which may be executed on distributed platforms such as the Cloud. In this paper, we focus on the provisioning of resources for scientific workflow ensembles and address the problem of meeting energy constraints along with either budget or deadline constraints. We propose and evaluate two energy-aware algorithms that can be used for
more » ... provisioning and task scheduling. Experimental evaluation is based on simulations using synthetic data based on parameters of real scientific workflow applications. The results show that our proposed algorithms can meet constraints and minimize energy consumption without compromising the number of completed workflows in an ensemble.
doi:10.1109/cgc.2013.14 dblp:conf/cgc/PietriMJDNS13 fatcat:344tx7xan5hblfyxldcccr4exa

Pipeline-centric provenance model

Paul Groth, Ewa Deelman, Gideon Juve, Gaurang Mehta, Bruce Berriman
2009 Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science - WORKS '09  
In this paper we propose a new provenance model which is tailored to a class of workflow-based applications. We motivate the approach with use cases from the astronomy community. We generalize the class of applications the approach is relevant to and propose a pipeline-centric provenance model. Finally, we evaluate the benefits in terms of storage needed by the approach when applied to an astronomy application. General Terms Documentation, Performance
doi:10.1145/1645164.1645168 dblp:conf/sc/GrothDJMB09 fatcat:jzob27pnzrfflftq7d2n2lvkhu
« Previous Showing results 1 — 15 out of 120 results