9,207 Hits in 6.4 sec

A Survey of Data-Intensive Scientific Workflow Management

Ji Liu, Esther Pacitti, Patrick Valduriez, Marta Mattoso
2015 Journal of Grid Computing  
A data-intensive scientific workflow is useful for modeling such process.  ...  Keywords scientific workflow · scientific workflow management system · grid · cloud · multisite cloud · distributed and parallel data management · scheduling · parallelization  ...  A Scientific Workflow Management System (SWfMS ) is an efficient tool to execute workflows and manage data sets in various computing environments.  ... 
doi:10.1007/s10723-015-9329-8 fatcat:5urst5aphjftbli3pukmnbutri

Data Management Challenges of Data-Intensive Scientific Workflows

Ewa Deelman, Ann Chervenak
2008 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID)  
However, many challenges remain in the area of data management related to workflow creation, execution, and result management.  ...  Much research to-date focuses on efficient, scalable, and robust workflow execution, especially in distributed environments.  ...  The authors would like to thank the ESG, LIGO, Montage, and SCEC collaborators for helpful discussions and fruitful collaborations.  ... 
doi:10.1109/ccgrid.2008.24 dblp:conf/ccgrid/DeelmanC08 fatcat:npefcon3tfftnku65yfnufrwey

Migrating Scientific Workflow Management Systems from the Grid to the Cloud [chapter]

Yong Zhao, Youfu Li, Ioan Raicu, Cui Lin, Wenhong Tian, Ruini Xue
2014 Cloud Computing for Data-Intensive Applications  
At the same time, scientific workflow management systems provide essential support and functionality to scientific computing, such as management of data and task dependencies, job scheduling and execution  ...  Migrating scientific workflow management systems from traditional Grid computing environments into the Cloud would enable a much broader user base to conduct their scientific research with ever increasing  ...  Acknowledgments This paper is supported by the key project of National Science Foundation of China No. 61034005 and No. 61272528.  ... 
doi:10.1007/978-1-4939-1905-5_10 fatcat:2hx7wzsucvdehdmr4fccd3yd2y

Integrating Policy with Scientific Workflow Management for Data-Intensive Applications

Ann L. Chervenak, David E. Smith, Weiwei Chen, Ewa Deelman
2012 2012 SC Companion: High Performance Computing, Networking Storage and Analysis  
The results show performance improvements for a data-intensive workflow: the Montage astronomy workflow augmented to perform additional large data staging operations.  ...  As scientific applications generate and consume data at ever-increasing rates, scientific workflow systems that manage the growing complexity of analyses and data movement will increase in importance.  ...  ACKNOWLEDGMENT This work was supported by NFS under grant number IIS-0905032 and used the FutureGrid environment, which was supported by NSF grant number 0910812.  ... 
doi:10.1109/sc.companion.2012.29 dblp:conf/sc/ChervenakSCD12 fatcat:koxibtbh55d3rb4jij7qkg7nfy

Streamlining Data-Intensive Biology With Workflow Systems [article]

Taylor Reiter, Phillip T. Brooks, Luiz Irber, Shannon E.K. Joslin, Charles M. Reid, Camille Scott, C. Titus Brown, N. Tessa Pierce
2020 bioRxiv   pre-print
Here, we provide a series of practices and strategies for leveraging workflow systems with structured project, data, and resource management to streamline large-scale biological analysis.  ...  The maturation of data-centric workflow systems that internally manage computational resources, software, and conditional execution of analysis steps are reshaping the landscape of biological data analysis  ...  Acknowledgements Thank you to all the members and a liates of the Lab for Data-Intensive Biology at UC Davis for providing valuable feedback on earlier versions of this manuscript and growing these practices  ... 
doi:10.1101/2020.06.30.178673 fatcat:up6eozdxyjhlxmkllqa4deewfm

Asterism: Pegasus and Dispel4py Hybrid Workflows for Data-Intensive Science

Rosa Filgueira, Rafael Ferreira da Silva, Amrey Krause, Ewa Deelman, Malcolm Atkinson
2016 2016 Seventh International Workshop on Data-Intensive Computing in the Clouds (DataCloud)  
Keywords Data-Intensive science, scientific workflows, stream-based system, deployment and reusability of execution environments  ...  We also present the Data-Intensive workflows as a Service (DIaaS) model, which enables easy dataintensive workflow composition and deployment on clouds using containers.  ...  We thank the NSF Chameleon Cloud for providing time grants to access their resources.  ... 
doi:10.1109/datacloud.2016.004 dblp:conf/sc/FilgueiraSKDA16 fatcat:efpt66w6tnho3p7a3j6eoqejva

Experiences with workflows for automating data-intensive bioinformatics

Ola Spjuth, Erik Bongcam-Rudloff, Guillermo Carrasco Hernández, Lukas Forer, Mario Giovacchini, Roman Valls Guimera, Aleksi Kallio, Eija Korpelainen, Maciej M Kańduła, Milko Krachunov, David P Kreil, Ognyan Kulev (+6 others)
2015 Biology Direct  
and carry out data management and analysis tasks on large scale.  ...  High-throughput technologies, such as next-generation sequencing, have turned molecular biology into a data-intensive discipline, requiring bioinformaticians to use high-performance computing resources  ...  MK and OK were supported by National Science Fund of Bulgaria within the "Methods for Data Analysis and Knowledge Discovery in Big Sequencing Dataset" project under contract DFNI02/7 of 12.12.2014.  ... 
doi:10.1186/s13062-015-0071-8 pmid:26282399 pmcid:PMC4539931 fatcat:cxotvdjwrndblm7gvu5myegsrq

Provenance for MapReduce-based data-intensive workflows

Daniel Crawl, Jianwu Wang, Ilkay Altintas
2011 Proceedings of the 6th workshop on Workflows in support of large-scale science - WORKS '11  
MapReduce has been widely adopted by many business and scientific applications for data-intensive processing of large datasets.  ...  There are increasing efforts for workflows and systems to work with the MapReduce programming model and the Hadoop environment including our work on a higherlevel programming model for MapReduce within  ...  This work was supported by NSF SDCI Award OCI-0722079 for Kepler/CORE and ABI Award DBI-1062565 for bioKepler, DOE SciDAC Award DE-FC02-07ER25811 for SDM Center, the UCGRID Project, and an SDSC Triton  ... 
doi:10.1145/2110497.2110501 dblp:conf/sc/CrawlWA11 fatcat:stchkdubsvajxidqsfqckbqdfi

Confuga: Scalable Data Intensive Computing for POSIX Workflows

Patrick Donnelly, Nicholas Hazekamp, Douglas Thain
2015 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing  
To address this gap, we introduce Confuga, a scalable data-intensive computing system that is largely compatible with the POSIX environment.  ...  This approach is highly effective whose the objective is to compute relatively simple functions on colossal amounts of data, but it is not a good match for a scientific computing environment which depends  ...  See the URLs below for source code and workflows used in this paper. cctools/tree/papers/confuga-ccgrid2015  ... 
doi:10.1109/ccgrid.2015.95 dblp:conf/ccgrid/DonnellyHT15 fatcat:o6u7duptlvhrbbarmcfgcyy7be

Adaptive Caching for Data-Intensive Scientific Workflows in the Cloud [chapter]

Gaëtan Heidsieck, Daniel de Oliveira, Esther Pacitti, Christophe Pradal, François Tardieu, Patrick Valduriez
2019 Lecture Notes in Computer Science  
In this paper, we propose an adaptive caching solution for data-intensive workflows in the cloud.  ...  Since it is common for workflow users to reuse other workflows or data generated by other workflows, a promising approach for efficient workflow execution is to cache intermediate data and exploit it to  ...  IFB (ANR-11-INBS-0013) from the Agence Nationale de la Recherche and the France Grille Scientific Interest Group.  ... 
doi:10.1007/978-3-030-27618-8_33 fatcat:uliaxc3nmbamvibtx2noq4bfja

Automating Data-Throttling Analysis for Data-Intensive Workflows

Ricardo J. Rodríguez, Rafael Tolosana-Calasanz, Omer F. Rana
2012 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)  
The method obtains data-throttling values for the data transfer to enable network bandwidth and buffer/storage capacity to be managed more efficiently.  ...  We convert a DAG representation into a Petri net model and analyse the resulting graph using an iterative method to compute data-throttling values.  ...  A centralised approach utilises a central point for data transmission. This solution is not scalable, and occurs in systems where the time for data transfers is much smaller than computations.  ... 
doi:10.1109/ccgrid.2012.27 dblp:conf/ccgrid/RodriguezTR12 fatcat:muxb56yprfeljeer7nx6fvx24y

Scalable Deployment of a LIGO Physics Application on Public Clouds: Workflow Engine and Resource Provisioning Techniques [chapter]

Suraj Pandey, Letizia Sammut, Rodrigo N. Calheiros, Andrew Melatos, Rajkumar Buyya
2014 Cloud Computing for Data-Intensive Applications  
In order to provide users an automated and scalable platform for hosting scientific workflow applications, while hiding the complexity of the underlying Cloud infrastructure, we present the design and  ...  volume of data and high compute load, flash crowds, unpredictable load, and varying compute and storage requirements.  ...  the Workflow Engine for the LIGO experiment.  ... 
doi:10.1007/978-1-4939-1905-5_1 fatcat:enp3mljaszc4ld7lkip5ouusvi

Skyport - Container-Based Execution Environment Management for Multi-cloud Scientific Workflows

Wolfgang Gerlach, Wei Tang, Kevin Keegan, Travis Harrison, Andreas Wilke, Jared Bischof, Mark DSouza, Scott Devoid, Daniel Murphy-Olson, Narayan Desai, Folker Meyer
2014 2014 5th International Workshop on Data-Intensive Computing in the Clouds  
As an extension to AWE/Shock, our data analysis platform that provides scalable workflow execution environments for scientific data in the cloud, Skyport greatly reduces the complexity associated with  ...  providing the environment necessary to execute complex workflows.  ...  Storage and analysis of such data has made it necessary to exploit grid and cloud computing resources with efficient workflow management systems, making it possible to process data quickly while at the  ... 
doi:10.1109/datacloud.2014.6 dblp:conf/sc/GerlachTKHWBDDM14 fatcat:izi4g67uknh7hbemoyilt4cx7a

On the use of burst buffers for accelerating data-intensive scientific workflows

Rafael Ferreira da Silva, Scott Callaghan, Ewa Deelman
2017 Proceedings of the 12th Workshop on Workflows in Support of Large-Scale Science - WORKS '17  
Science applications frequently produce and consume large volumes of data, but delivering this data to and from compute resources can be challenging, as parallel file system performance is not keeping  ...  up with compute and memory performance.  ...  In a recent survey on the management of data-intensive workflows [19] , several techniques and strategies, including scheduling and parallel processing, are presented on how workflow systems manage data-intensive  ... 
doi:10.1145/3150994.3151000 dblp:conf/sc/SilvaCD17 fatcat:tomvvnfqgnbdlbjyl67x4u7z5y

Data-Intensive Workflow Optimization Based on Application Task Graph Partitioning in Heterogeneous Computing Systems

Saima Gulzar Ahmad, Chee Sun Liew, M. Mustafa Rafique, Ehsan Ullah Munir, Samee U. Khan
2014 2014 IEEE Fourth International Conference on Big Data and Cloud Computing  
This paper presents a dual objective Partitioning based Data-intensive Workflow optimization Algorithm (PDWA) for heterogeneous computing systems.  ...  Optimization of these performance metrics in heterogeneous computing environment becomes more challenging due to the difference in the computing capacity of execution nodes and variations in the data transfer  ...  ACKNOWLEDGMENT The work presented in this paper is supported by the Ministry of Education Malaysia (FRGS FP051-2013A and UMRG RP001F-13ICT).  ... 
doi:10.1109/bdcloud.2014.63 dblp:conf/bdcloud/AhmadLRMK14 fatcat:q4vegly4tbg5joh6po3miz3j2e
« Previous Showing results 1 — 15 out of 9,207 results