A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
A Survey of Data-Intensive Scientific Workflow Management
2015
Journal of Grid Computing
A data-intensive scientific workflow is useful for modeling such process. ...
Keywords scientific workflow · scientific workflow management system · grid · cloud · multisite cloud · distributed and parallel data management · scheduling · parallelization ...
A Scientific Workflow Management System (SWfMS ) is an efficient tool to execute workflows and manage data sets in various computing environments. ...
doi:10.1007/s10723-015-9329-8
fatcat:5urst5aphjftbli3pukmnbutri
Data Management Challenges of Data-Intensive Scientific Workflows
2008
2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID)
However, many challenges remain in the area of data management related to workflow creation, execution, and result management. ...
Much research to-date focuses on efficient, scalable, and robust workflow execution, especially in distributed environments. ...
The authors would like to thank the ESG, LIGO, Montage, and SCEC collaborators for helpful discussions and fruitful collaborations. ...
doi:10.1109/ccgrid.2008.24
dblp:conf/ccgrid/DeelmanC08
fatcat:npefcon3tfftnku65yfnufrwey
Migrating Scientific Workflow Management Systems from the Grid to the Cloud
[chapter]
2014
Cloud Computing for Data-Intensive Applications
At the same time, scientific workflow management systems provide essential support and functionality to scientific computing, such as management of data and task dependencies, job scheduling and execution ...
Migrating scientific workflow management systems from traditional Grid computing environments into the Cloud would enable a much broader user base to conduct their scientific research with ever increasing ...
Acknowledgments This paper is supported by the key project of National Science Foundation of China No. 61034005 and No. 61272528. ...
doi:10.1007/978-1-4939-1905-5_10
fatcat:2hx7wzsucvdehdmr4fccd3yd2y
Integrating Policy with Scientific Workflow Management for Data-Intensive Applications
2012
2012 SC Companion: High Performance Computing, Networking Storage and Analysis
The results show performance improvements for a data-intensive workflow: the Montage astronomy workflow augmented to perform additional large data staging operations. ...
As scientific applications generate and consume data at ever-increasing rates, scientific workflow systems that manage the growing complexity of analyses and data movement will increase in importance. ...
ACKNOWLEDGMENT This work was supported by NFS under grant number IIS-0905032 and used the FutureGrid environment, which was supported by NSF grant number 0910812. ...
doi:10.1109/sc.companion.2012.29
dblp:conf/sc/ChervenakSCD12
fatcat:koxibtbh55d3rb4jij7qkg7nfy
Streamlining Data-Intensive Biology With Workflow Systems
[article]
2020
bioRxiv
pre-print
Here, we provide a series of practices and strategies for leveraging workflow systems with structured project, data, and resource management to streamline large-scale biological analysis. ...
The maturation of data-centric workflow systems that internally manage computational resources, software, and conditional execution of analysis steps are reshaping the landscape of biological data analysis ...
Acknowledgements Thank you to all the members and a liates of the Lab for Data-Intensive Biology at UC Davis for providing valuable feedback on earlier versions of this manuscript and growing these practices ...
doi:10.1101/2020.06.30.178673
fatcat:up6eozdxyjhlxmkllqa4deewfm
Asterism: Pegasus and Dispel4py Hybrid Workflows for Data-Intensive Science
2016
2016 Seventh International Workshop on Data-Intensive Computing in the Clouds (DataCloud)
Keywords Data-Intensive science, scientific workflows, stream-based system, deployment and reusability of execution environments ...
We also present the Data-Intensive workflows as a Service (DIaaS) model, which enables easy dataintensive workflow composition and deployment on clouds using containers. ...
We thank the NSF Chameleon Cloud for providing time grants to access their resources. ...
doi:10.1109/datacloud.2016.004
dblp:conf/sc/FilgueiraSKDA16
fatcat:efpt66w6tnho3p7a3j6eoqejva
Experiences with workflows for automating data-intensive bioinformatics
2015
Biology Direct
and carry out data management and analysis tasks on large scale. ...
High-throughput technologies, such as next-generation sequencing, have turned molecular biology into a data-intensive discipline, requiring bioinformaticians to use high-performance computing resources ...
MK and OK were supported by National Science Fund of Bulgaria within the "Methods for Data Analysis and Knowledge Discovery in Big Sequencing Dataset" project under contract DFNI02/7 of 12.12.2014. ...
doi:10.1186/s13062-015-0071-8
pmid:26282399
pmcid:PMC4539931
fatcat:cxotvdjwrndblm7gvu5myegsrq
Provenance for MapReduce-based data-intensive workflows
2011
Proceedings of the 6th workshop on Workflows in support of large-scale science - WORKS '11
MapReduce has been widely adopted by many business and scientific applications for data-intensive processing of large datasets. ...
There are increasing efforts for workflows and systems to work with the MapReduce programming model and the Hadoop environment including our work on a higherlevel programming model for MapReduce within ...
This work was supported by NSF SDCI Award OCI-0722079 for Kepler/CORE and ABI Award DBI-1062565 for bioKepler, DOE SciDAC Award DE-FC02-07ER25811 for SDM Center, the UCGRID Project, and an SDSC Triton ...
doi:10.1145/2110497.2110501
dblp:conf/sc/CrawlWA11
fatcat:stchkdubsvajxidqsfqckbqdfi
Confuga: Scalable Data Intensive Computing for POSIX Workflows
2015
2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing
To address this gap, we introduce Confuga, a scalable data-intensive computing system that is largely compatible with the POSIX environment. ...
This approach is highly effective whose the objective is to compute relatively simple functions on colossal amounts of data, but it is not a good match for a scientific computing environment which depends ...
See the URLs below for source code and workflows used in this paper. http://ccl.cse.nd.edu/ https://github.com/cooperative-computing-lab/ cctools/tree/papers/confuga-ccgrid2015 ...
doi:10.1109/ccgrid.2015.95
dblp:conf/ccgrid/DonnellyHT15
fatcat:o6u7duptlvhrbbarmcfgcyy7be
Adaptive Caching for Data-Intensive Scientific Workflows in the Cloud
[chapter]
2019
Lecture Notes in Computer Science
In this paper, we propose an adaptive caching solution for data-intensive workflows in the cloud. ...
Since it is common for workflow users to reuse other workflows or data generated by other workflows, a promising approach for efficient workflow execution is to cache intermediate data and exploit it to ...
IFB (ANR-11-INBS-0013) from the Agence Nationale de la Recherche and the France Grille Scientific Interest Group. ...
doi:10.1007/978-3-030-27618-8_33
fatcat:uliaxc3nmbamvibtx2noq4bfja
Automating Data-Throttling Analysis for Data-Intensive Workflows
2012
2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
The method obtains data-throttling values for the data transfer to enable network bandwidth and buffer/storage capacity to be managed more efficiently. ...
We convert a DAG representation into a Petri net model and analyse the resulting graph using an iterative method to compute data-throttling values. ...
A centralised approach utilises a central point for data transmission. This solution is not scalable, and occurs in systems where the time for data transfers is much smaller than computations. ...
doi:10.1109/ccgrid.2012.27
dblp:conf/ccgrid/RodriguezTR12
fatcat:muxb56yprfeljeer7nx6fvx24y
Scalable Deployment of a LIGO Physics Application on Public Clouds: Workflow Engine and Resource Provisioning Techniques
[chapter]
2014
Cloud Computing for Data-Intensive Applications
In order to provide users an automated and scalable platform for hosting scientific workflow applications, while hiding the complexity of the underlying Cloud infrastructure, we present the design and ...
volume of data and high compute load, flash crowds, unpredictable load, and varying compute and storage requirements. ...
the Workflow Engine for the LIGO experiment. ...
doi:10.1007/978-1-4939-1905-5_1
fatcat:enp3mljaszc4ld7lkip5ouusvi
Skyport - Container-Based Execution Environment Management for Multi-cloud Scientific Workflows
2014
2014 5th International Workshop on Data-Intensive Computing in the Clouds
As an extension to AWE/Shock, our data analysis platform that provides scalable workflow execution environments for scientific data in the cloud, Skyport greatly reduces the complexity associated with ...
providing the environment necessary to execute complex workflows. ...
Storage and analysis of such data has made it necessary to exploit grid and cloud computing resources with efficient workflow management systems, making it possible to process data quickly while at the ...
doi:10.1109/datacloud.2014.6
dblp:conf/sc/GerlachTKHWBDDM14
fatcat:izi4g67uknh7hbemoyilt4cx7a
On the use of burst buffers for accelerating data-intensive scientific workflows
2017
Proceedings of the 12th Workshop on Workflows in Support of Large-Scale Science - WORKS '17
Science applications frequently produce and consume large volumes of data, but delivering this data to and from compute resources can be challenging, as parallel file system performance is not keeping ...
up with compute and memory performance. ...
In a recent survey on the management of data-intensive workflows [19] , several techniques and strategies, including scheduling and parallel processing, are presented on how workflow systems manage data-intensive ...
doi:10.1145/3150994.3151000
dblp:conf/sc/SilvaCD17
fatcat:tomvvnfqgnbdlbjyl67x4u7z5y
Data-Intensive Workflow Optimization Based on Application Task Graph Partitioning in Heterogeneous Computing Systems
2014
2014 IEEE Fourth International Conference on Big Data and Cloud Computing
This paper presents a dual objective Partitioning based Data-intensive Workflow optimization Algorithm (PDWA) for heterogeneous computing systems. ...
Optimization of these performance metrics in heterogeneous computing environment becomes more challenging due to the difference in the computing capacity of execution nodes and variations in the data transfer ...
ACKNOWLEDGMENT The work presented in this paper is supported by the Ministry of Education Malaysia (FRGS FP051-2013A and UMRG RP001F-13ICT). ...
doi:10.1109/bdcloud.2014.63
dblp:conf/bdcloud/AhmadLRMK14
fatcat:q4vegly4tbg5joh6po3miz3j2e
« Previous
Showing results 1 — 15 out of 9,207 results