9,278 Hits in 4.9 sec

Towards A Methodology and Framework for Workflow-Driven Team Science [article]

Ilkay Altintas, Shweta Purawat, Daniel Crawl, Alok Singh, Kyle Marcus
2019 arXiv   pre-print
However, data and computing advances continuously change the way scientific workflows get developed and executed, pushing the scientific activity to be more data-driven, heterogeneous and collaborative  ...  Scientific workflows are powerful tools for management of scalable experiments, often composed of complex tasks running on distributed resources.  ...  The authors would also like to thank Workflows for Data Science (WorDS) Center of Excellence team members Mai Nguyen and Volkan Vural for their participation in the discussions leading to this paper.  ... 
arXiv:1903.01403v1 fatcat:ubbe6qk5kfb77erhetvp25jele

Enabling Big Geoscience Data Analytics with a Cloud-Based, MapReduce-Enabled and Service-Oriented Workflow Framework

Zhenlong Li, Chaowei Yang, Baoxuan Jin, Manzhu Yu, Kai Liu, Min Sun, Matthew Zhan, Moncho Gomez-Gesteira
2015 PLoS ONE  
To tackle these challenges, a scientific workflow framework is proposed for big geoscience data analytics.  ...  And service-oriented workflow architecture is built for supporting on-demand complex data analytics in the cloud environment. A proof-of-concept prototype tests the performance of the framework.  ...  George Taylor helped proof an earlier version of the manuscript. Author Contributions  ... 
doi:10.1371/journal.pone.0116781 pmid:25742012 pmcid:PMC4351198 fatcat:qwlkh75jabcplpjs24emnfixyi

Automatic Selection and Parameter Configuration of Big Data Software Core Components Based on Retention Pattern

Ping Xu, Yi-Zhang Jiang
2021 Mathematical Problems in Engineering  
This paper conducts an in-depth analysis and research on the automatic selection and parameter configuration of the core components of Big Data software by using the retention model and the automatic selection  ...  of Big Data components by establishing a standardized requirement index and using the decision tree model to solve the problem of component selection in Big Data application development.  ...  Introduction Big Data technology is no longer unfamiliar to us, and applications of Big Data technology are everywhere.  ... 
doi:10.1155/2021/6667275 fatcat:562xlsfmhncfxehqgdbe44bewu

DARE: A Reflective Platform Designed to Enable Agile Data-Driven Research on the Cloud

Iraklis Klampanos, Federica Magnoni, Emanuele Casarotti, Christian Page, Mike Lindner, Andreas Ikonomopoulos, Vangelis Karkaletsis, Athanasios Davvetas, Andre Gemund, Malcolm Atkinson, Antonios Koukourikos, Rosa Filgueira (+3 others)
2019 2019 15th International Conference on eScience (eScience)  
It implements the cataloguing and execution of fine-grained and Python-based dispel4py workflows as services.  ...  The DARE platform has been designed to help research developers deliver user-facing applications and solutions over diverse underlying e-infrastructures, data and computational contexts.  ...  and user information, e.g. workflows, software components, data provenance and data. 5) Integrated big data tools, as well as connectors to external data sources. 6) Exposing all relevant functionality  ... 
doi:10.1109/escience.2019.00079 dblp:conf/eScience/KlampanosMCPLIK19 fatcat:7immlbim2racxa534ovdylysfi

Multilevel Workflow System in the ATLAS Experiment

M Borodin, K De, J Garcia Navarro, D Golubkov, A Klimentov, T Maeno, A Vaniachine
2015 Journal of Physics, Conference Series  
The ATLAS experiment is scaling up Big Data processing for the next LHC run using a multilevel workflow system comprised of many layers.  ...  In Big Data processing ATLAS deals with datasets, not individual files.  ...  Acknowledgments We thank all our colleagues who contributed to the ATLAS Big Data processing infrastructure development and operations. This work was funded in part by the U. S.  ... 
doi:10.1088/1742-6596/608/1/012015 fatcat:e4j4fxlufrbrrlk5h762ghx6eq

Workflow Coordinated Resources Allocation for Big Data Analytics in the Cloud [chapter]

Niki Sfika, Konstantinos Manos, Aigli Korfiati, Christos Alexakos, Spiridon Likothanassis
2015 IFIP Advances in Information and Communication Technology  
The proposed architecture aims to achieve a certain level of dynamic scalability and flexibility in applications executing complex and highly demanding Big Data Analytics processes.  ...  Thus, the on demand provision of both hardware and software as a service from cloud providers makes cloud computing sufficient to be used as infrastructure for Big Data Analytics applications.  ... 
doi:10.1007/978-3-319-23868-5_28 fatcat:u24bd2ieh5e6xn23zf5yz3xo3m

A Review of Scalable Bioinformatics Pipelines

Bjørn Fjukstad, Lars Ailo Bongo
2017 Data Science and Engineering  
Here, we survey several scalable bioinformatics pipelines and compare their design and their use of underlying frameworks and infrastructures.  ...  The pipelines used to implement analyses must therefore scale with respect to the resources on a single compute node, the number of nodes on a cluster, and also to cost-performance.  ...  Software containerization packages an application and its dependencies in an isolated execution environment. One popular implementation of software container is Docker [25] .  ... 
doi:10.1007/s41019-017-0047-z fatcat:7wyzccy7ffhjdd46pfmljrzioy

Trident: scalable compute archives: workflows, visualization, and analysis

Arvind Gopu, Soichi Hayashi, Michael D. Young, Ralf Kotulla, Robert Henschel, Daniel Harbeck, Gianluca Chiozzi, Juan C. Guzman
2016 Software and Cyberinfrastructure for Astronomy IV  
The software suite currently consists of (1) a simple Workflow execution framework to integrate, deploy, and execute pipelines and applications (2) a Progress service to monitor workflows and sub-workflows  ...  However, data processing is only a small part of the Big Data challenge.  ...  The Trident microservice software suite consists of (1) a Workflow execution framework to integrate, deploy, and execute workflows (2) a Progress service to monitor workflows and sub-workflows (for eg.  ... 
doi:10.1117/12.2233111 fatcat:gafbjj3agngfxeir5lc7cod25e

A view of programming scalable data analysis: from clouds to exascale

Domenico Talia
2019 Journal of Cloud Computing: Advances, Systems and Applications  
Scalable big data analysis today can be achieved by parallel implementations that are able to exploit the computing and storage facilities of high performance computing (HPC) systems and clouds, whereas  ...  Scalability is a key feature for big data analysis and machine learning frameworks and for applications that need to analyze very large and real-time data available from data repositories, social media  ...  Availability of data and materials Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.  ... 
doi:10.1186/s13677-019-0127-x fatcat:l5mimqzwibh7fn4fedlsz4jkji

Scalable Workflow-Driven Hydrologic Analysis in HydroFrame [chapter]

Shweta Purawat, Cathie Olschanowsky, Laura E. Condon, Reed Maxwell, Ilkay Altintas
2020 Lecture Notes in Computer Science  
Towards this goal, we present a design that leverages provenance data and machine learning techniques to predict performance and forecast failures using an automatic performance collection component of  ...  The Kepler workflow ensures complete reproducibility through a built-in provenance framework that collects workflow specific parameters, software versions, and hardware system configuration.  ...  [22] examines the challenges in hydrology related to big data, data flow, and model management, and build a preliminary adaptation of the Kepler Scientific Workflow System for hydrology applications  ... 
doi:10.1007/978-3-030-50371-0_20 fatcat:c6632vj24nd25dcdq73i6bxhyu

Scalable Workflows and Reproducible Data Analysis for Genomics [chapter]

Francesco Strozzi, Roel Janssen, Ricardo Wurmus, Michael R. Crusoe, George Githinji, Paolo Di Tommaso, Dominique Belhachemi, Steffen Möller, Geert Smant, Joep de Ligt, Pjotr Prins
2019 Msphere  
Each of which can be run in parallel.We show how to bundle a number of tools used in evolutionary biology by using Debian, GNU Guix, and Bioconda software distributions, along with the use of container  ...  this chapter we show how to describe and execute the same analysis using a number of workflow systems and how these follow different approaches to tackle execution and reproducibility issues.  ...  With the advent of "deep neural networks" and the general adoption of machine learning techniques for Big Data, GPUs have become a mainstream technology in data mining.  ... 
doi:10.1007/978-1-4939-9074-0_24 pmid:31278683 fatcat:qtv5vvi2vjhkvaqp42cvqo5eh4

Scalable data analysis in proteomics and metabolomics using BioContainers and workflows engines [article]

Yasset Perez-Riverol, Pablo Moreno
2019 bioRxiv   pre-print
We discuss the combination of software containers with workflows environments for large scale metabolomics and proteomics analysis.  ...  In this paper we overview the key steps of metabolomic and proteomics data processing including main tools and software use to perform the data analysis.  ...  This abstraction level is gurranted by using an execution layout that defines which type of containers will be used to execute the tools (components of the workflow) and which .  ... 
doi:10.1101/604413 fatcat:doy6yj6g3jhjzi5kn3bzaimt3m

Workflow-Driven Distributed Machine Learning in CHASE-CI: A Cognitive Hardware and Software Ecosystem Community Infrastructure [article]

Ilkay Altintas, Kyle Marcus, Isaac Nealey, Scott L. Sellars, John Graham, Dima Mishin, Joel Polizzi, Daniel Crawl, Thomas DeFanti, Larry Smarr
2019 arXiv   pre-print
This paper contributes a workflow-driven approach for dynamic data-driven application development on top of a new kind of networked Cyberinfrastructure called CHASE-CI.  ...  The advances in data, computing and networking over the last two decades led to a shift in many application domains that includes machine learning on big data as a part of the scientific process, requiring  ...  The content is solely the responsibility of the authors and does not necessarily represent the official views of the funding agencies.  ... 
arXiv:1903.06802v1 fatcat:2dw4sw2k6rhtninq2sxezmrr34

Big Data Workflows: Locality-Aware Orchestration Using Software Containers

Andrei-Alin Corodescu, Nikolay Nikolov, Akif Quddus Khan, Ahmet Soylu, Mihhail Matskin, Amir H. Payberah, Dumitru Roman
2021 Sensors  
This article proposes a novel architecture and a proof-of-concept implementation for software container-centric big data workflow orchestration that puts data locality at the forefront.  ...  The proposed solution considers the available data locality information, leverages long-lived containers to execute workflow steps, and handles the interaction with different data sources through containers  ...  Scalable Execution of Big Data Workflows using Software Containers.  ... 
doi:10.3390/s21248212 pmid:34960302 pmcid:PMC8706844 fatcat:3nc2j4pvdfdynn573zzq7ympca

Computational Strategies for Scalable Genomics Analysis

Lizhen Shi, Zhong Wang
2019 Genes  
Various big data technologies have been explored to scale up/out current bioinformatics solutions to mine the big genomics data.  ...  We comment on the pros and cons of each strategy in the context of ease of development, robustness, scalability, and efficiency.  ...  Another solution is to use a container-native workflow engine such as Argo [55] , which supports both directed acyclic graph (DAG) and step-based workflows on Kubernetes.  ... 
doi:10.3390/genes10121017 pmid:31817630 fatcat:5clqzsy65jcw3mpojnqjznxrzm
« Previous Showing results 1 — 15 out of 9,278 results