Filters








206,702 Hits in 4.0 sec

Distributed data provenance for large-scale data-intensive computing

Dongfang Zhao, Chen Shou, Tanu Maliky, Ioan Raicu
2013 2013 IEEE International Conference on Cluster Computing (CLUSTER)  
It has become increasingly important to capture and understand the origins and derivation of data (its provenance).  ...  A key issue in evaluating the feasibility of data provenance is its performance, overheads, and scalability.  ...  The authors are grateful to Xian-He Sun for providing the access to the Linux cluster.  ... 
doi:10.1109/cluster.2013.6702685 dblp:conf/cluster/ZhaoSMR13 fatcat:yxktbz7zu5gf7m2umseien36de

Special issue for data intensive eScience

Judy Qiu, Dennis Gannon
2012 Distributed and parallel databases  
Astronomy and the search for fundamental particles at the Large Hadron Collider drive mainstream aspects of data-intensive science with many petabytes of data derived from large advanced instruments.  ...  New fields are being born, such as drug discovery based on large scale study of correlations in published papers and climate implications from data on the accelerating pace of changes in previously quiescent  ...  Acknowledgements We would like to thank the authors for contributing papers on their research on latest trends in data intensive technologies and applications for this special issue, and thank all the  ... 
doi:10.1007/s10619-012-7107-1 fatcat:s3yzixkgprfm5cbijcxzyziuem

Opportunities and Challenges in Running Scientific Workflows on the Cloud

Yong Zhao, Xubo Fei, Ioan Raicu, Shiyong Lu
2011 2011 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery  
Cloud computing; Scientific Workflow; Cloud workflow; Data Intensive Computing I.  ...  We coin the term "Cloud Workflow", to refer to the specification, execution, provenance tracking of large-scale scientific workflows, as well as the management of data and computing resources to enable  ...  We define Cloud computing as a large-scale distributed computing paradigm that is driven by economies of scale, in which a pool of abstracted, virtualized, dynamically-scalable, managed computing power  ... 
doi:10.1109/cyberc.2011.80 dblp:conf/cyberc/ZhaoFRL11 fatcat:zlpe7p73lnh2jksgtnstvkafn4

Challenges and approaches for distributed workflow-driven analysis of large-scale biological data

Ilkay Altintas, Jianwu Wang, Daniel Crawl, Weizhong Li
2012 Proceedings of the 2012 Joint EDBT/ICDT Workshops on - EDBT-ICDT '12  
Middleware and technologies for scientific workflows and data-intensive computing promise new capabilities to enable rapid analysis of next-generation sequence data.  ...  the development of Kepler workflows for integrated execution of bioinformatics applications in distributed environments.  ...  To date, there have been a number of studies for dataintensive analysis of large-scale bioinformatics datasets on Cloud computing platforms.  ... 
doi:10.1145/2320765.2320791 dblp:conf/edbt/AltintasWCL12 fatcat:lot2dlhp4fh45izbyqdiw3ta2y

OUP accepted manuscript

2017 Briefings in Bioinformatics  
Furthermore, the high data growth rates in bioinformatics research drive the demand for parallel and distributed computing, which then imposes a need for scalability and highthroughput capabilities onto  ...  In this article, we will analyze the existing DWFS with regard to their capabilities toward public open data use as well as large-scale computational and human interface requirements.  ...  and João Bosco Jares for helping them in drawing the Figure 2 .  ... 
doi:10.1093/bib/bbx039 pmid:28419324 pmcid:PMC6169675 fatcat:xsdfepwqmnb7pgkmnep2ifn2wi

Cloud Computing and Grid Computing 360-Degree Compared [article]

Ian Foster, Yong Zhao, Ioan Raicu, Shiyong Lu
2008 arXiv   pre-print
such as utility computing, cluster computing, and distributed systems in general.  ...  Cloud Computing has become another buzzword after Web 2.0. However, there are dozens of different definitions for Cloud Computing and there seems to be no consensus on what a Cloud is.  ...  for both Cloud Computing and Client Computing with the increase of data-intensive applications.  ... 
arXiv:0901.0131v1 fatcat:ybwx6z62k5fnpcyicyxvxsydou

Big Data Provenance: Challenges and Implications for Benchmarking [chapter]

Boris Glavic
2014 Lecture Notes in Computer Science  
This paper reviews existing approaches for large-scale distributed provenance and discusses potential challenges for Big Data benchmarks that aim to incorporate provenance data/management.  ...  Provenance has been studied by the database, workflow, and distributed systems communities, but provenance for Big Data -let us call it Big Provenance -is a largely unexplored field.  ...  This enables analysts with little knowledge about distributed systems to run large scale analytics.  ... 
doi:10.1007/978-3-642-53974-9_7 fatcat:zjoio7iemre57ktgvzzssqs7lm

Towards web-scale how-provenance

Daniel Deutch, Amir Gilad, Yuval Moskovitch
2015 2015 31st IEEE International Conference on Data Engineering Workshops  
The annotation of data with meta-data, and its propagation through data-intensive computation in a way that follows the transformations that the data undergoes ("howprovenance"), has many applications,  ...  We envision an approach for addressing this complex problem, through allowing selective tracking of how-provenance, where the selection criteria are partly based on the meta-data itself.  ...  Additionally, large-scale computation is often distributed, and thus the (selective) generation of provenance needs to be correspondingly distributed among the peers.  ... 
doi:10.1109/icdew.2015.7129547 dblp:conf/icde/DeutchGM15a fatcat:7kjdymggufggvkmgzaxgkwn5qa

Reusing distributed computing software and patterns for midscale collaborative science

Paschalis Paschos, Mats Rynge, Benedikt Riedel, Frank Wuerthwein, Robert William Gardner
2019 Zenodo  
Examples are re-use of the Rucio and FTS3 software for reliable data transfer and management, XRootD for data access and caching, Ceph for large scale pre-processing storage, and Pegasus for workflow management  ...  In the Open Science Grid, we have organized a team designed to support collaborative science organizations re-use proven software and patterns in distributed processing and data management, often but not  ...  Motivation & Background In Open Science Grid we support science at all scalesLarge scale science ○ Significant software and computing teams ○ Experiment-specific workload and data management systems  ... 
doi:10.5281/zenodo.3599634 fatcat:kfcbwuqysbdqli6z2j2iqn34jq

Data Mining in Earth System Science (DMESS 2011)

Forrest M. Hoffman, J. Walter Larson, Richard Tran Mills, Bjørn-Gustaf J. Brooks, Auroop R. Ganguly, William W. Hargrove, Jian Huang, Jitendra Kumar, Ranga R. Vatsavai
2011 Procedia Computer Science  
From field-scale measurements to global climate simulations and remote sensing, the growing body of very large and long time series Earth science data are increasingly difficult to analyze, visualize,  ...  The size and complexity of Earth science data exceed the limits of most analysis tools and the capacities of desktop computers.  ...  In addition to the data management issues of provenance, curation, metadata creation, and public distribution, today's large and complex Earth science data often cannot be synthesized and analyzed using  ... 
doi:10.1016/j.procs.2011.04.157 fatcat:rwvpv2pzk5h4vmhiscyndohhim

Distributed Database Research at COPPE/UFRJ

Marta Mattoso, Vanessa Braganholo, Alexandre A. B. Lima, Leonardo Murta
2011 Journal of Information and Data Management  
More recently, large-scale scientific data combined with process activities management have introduced challenges to the database and software engineering communities, among several other computer science  ...  Since each scientific experiment tends to produce and manage its own data, in specific formats, with its own activities (and programs), managing large scale distributed data and activities gets difficult  ...  ACKNOWLEDGEMENTS We would like to thank the High Performance Computing Center (NACAD-COPPE/UFRJ) and Grid'5000 (INRIA) where the experiments were performed.  ... 
dblp:journals/jidm/MattosoBLM11 fatcat:rekvtcyqhfgfljyebqzktgzw5u

DARE Platform: a Developer-Friendly and Self-Optimising Workflows-as-a-Service Framework for e-Science on the Cloud

Iraklis Klampanos, Chrysoula Themeli, Alessandro Spinuso, Rosa Filgueira, Malcolm Atkinson, André Gemünd, Vangelis Karkaletsis
2020 Journal of Open Source Software  
In recent years, science has relied more than ever on large-scale data as well as on distributed computing and human resources.  ...  Scientists and research engineers in fields such as climate science and computational seismology, constantly strive to make good use of remote and largely heterogeneous computing resources (HPC, Cloud,  ...  Statement of need In recent years, science has relied more than ever on large-scale data as well as on distributed computing and human resources.  ... 
doi:10.21105/joss.02664 fatcat:zx3k5fughnellcq5wtfpvgfo5m

Enabling Provenance on Large Scale e-Science Applications [chapter]

Miguel Branco, Luc Moreau
2006 Lecture Notes in Computer Science  
Large-scale e-Science experiments present unprecedented data handling requirements with their multi-petabyte data storages.  ...  In this paper, we introduce a multi-phase infrastructure to achieve data provenance for an e-Science experiment.  ...  Conclusion The proposed provenance infrastructure enables provenance-awareness for large-scale e-Science experiments, particularly those handling large volumes of data.  ... 
doi:10.1007/11890850_7 fatcat:klpvdfqnhngotgvsuok4tieio4

Big Data Provenance: State-Of-The-Art Analysis and Emerging Research Challenges

Alfredo Cuzzocrea
2016 International Conference on Extending Database Technology  
This contribution aims at representing a milestone in the exciting big data provenance research road.  ...  Big data provenance is actually one of the most relevant problem in big data research, as confirmed by the great deal of attention devoted to this topic by larger and larger database and data mining research  ...  This poses relevant issues, as big data are typically growing-in-size and large-scale.  ... 
dblp:conf/edbt/Cuzzocrea16 fatcat:rbx2znksnzhhvjuymparisgeni

Integrating prediction, provenance, and optimization into high energy workflows

M Schram, V Bansal, R D Friese, N R Tallent, J Yin, K J Barker, E Stephan, M Halappanavar, D J Kerbyson
2017 Journal of Physics, Conference Series  
I/O optimizations such as prefetching; and provenance methods for collecting performance data.  ...  We propose a novel approach for efficient execution of workflows on distributed resources.  ...  Acknowledgments Todd Elsethagen (PNNL) and Bibi Raju (PNNL) are part of the ProvEn team. We are grateful for funding support from the U.S.  ... 
doi:10.1088/1742-6596/898/6/062052 fatcat:uvdozmus2vhttakr2ydhqdmxo4
« Previous Showing results 1 — 15 out of 206,702 results