Filters








3,721 Hits in 5.6 sec

Provenance in Databases: Past, Current, and Future

Wang Chiew Tan
2007 IEEE Data Engineering Bulletin  
The need to understand and manage provenance arises in almost every scientific application.  ...  For these reasons, the knowledge of provenance of a scientific result is typically regarded to be as important as the result itself.  ...  Readers who are interested in workflow provenance may find the following references useful: A survey of provenance research related to scientific data processing and scientific workflow systems [5, 19  ... 
dblp:journals/debu/Tan07 fatcat:m6q4ulqzjja2ficlhf6cv67jci

Retrospective Provenance Without a Runtime Provenance Recorder

Timothy M. McPhillips, Shawn Bowers, Khalid Belhajjame, Bertram Ludäscher
2015 Workshop on the Theory and Practice of Provenance  
We present scientifically meaningful retrospective provenance queries for investigating an execution of a data acquisition workflow implemented as a Python script, and show how these queries can be evaluated  ...  YW tools extract and analyze these comments, represent scripts in terms of entities based on a typical scientific workflow model, and provide graphical workflow views (i.e., prospective provenance) of  ...  We report in this section a number of typical provenance queries that are expressed against our example script.  ... 
dblp:conf/tapp/McPhillipsBBL15 fatcat:lzdprpzqgnanjlilkpzrviyvye

Semantic Provenance for eScience: Managing the Deluge of Scientific Data

Satya S. Sahoo, Amit Sheth, Cory Henson
2008 IEEE Internet Computing  
Workflow engines relying purely on WSDL descriptions to derive provenance information might not be a sustainable approach.  ...  The second type of query is answered using semantic provenance.  ... 
doi:10.1109/mic.2008.86 fatcat:4qtokzdkmva57djepa5dismvqi

Connecting Scientific Data to Scientific Experiments with Provenance

Simon Miles, Ewa Deelman, Paul Groth, Karan Vahi, Gaurang Mehta, Luc Moreau
2007 Third IEEE International Conference on e-Science and Grid Computing (e-Science 2007)  
In this paper, we describe preliminary work on adapting Pegasus to capture the process of workflow refinement in the PASOA provenance system.  ...  To reconnect the scientific data with the original experiment, we argue that scientists should have access to the full provenance of their data, including not only parameters, inputs and intermediary data  ...  • Why is this intermediate data not in the registry?  ... 
doi:10.1109/e-science.2007.22 dblp:conf/eScience/MilesDGVMM07 fatcat:dn7i4mxwoffzlpa65ks4dykis4

Provenance for MapReduce-based data-intensive workflows

Daniel Crawl, Jianwu Wang, Ilkay Altintas
2011 Proceedings of the 6th workshop on Workflows in support of large-scale science - WORKS '11  
However, to date, provenance of MapReduce-based workflows and its effects on workflow execution performance have not been studied in depth.  ...  the Kepler Scientific Workflow System.  ...  Each Map sub-workflow runs BLAST to process a subset of the query sequences against the reference database. In the Reduce sub-workflow, the outputs for each subset are merged into one output.  ... 
doi:10.1145/2110497.2110501 dblp:conf/sc/CrawlWA11 fatcat:stchkdubsvajxidqsfqckbqdfi

Enhancing workflow with a semantic description of scientific intent

Edoardo Pignotti, Peter Edwards, Nick Gotts, Gary Polhill
2011 Journal of Web Semantics  
This paper proposes an abstract model of intent based on the Open Provenance Model (OPM) specification.  ...  Current workflow technologies do not incorporate any representation of experimental constraints and goals, which we refer to in this paper as scientist's intent.  ...  [15] expands on the Zachman Framework [16] by presenting the '7 W's of Provenance': Who, What, Where, Why, When, Which, & (W)How.  ... 
doi:10.1016/j.websem.2011.05.001 fatcat:jbo4xgu6rjhotgazjpxkst4t34

Revealing the Detailed Lineage of Script Outputs Using Hybrid Provenance

Qian Zhang, Yang Cao, Qiwen Wang, Duc Vu, Priyaa Thavasimani, Timothy McPhillips, Paolo Missier, Peter Slaughter, Christopher Jones, Mathew B. Jones, Bertram Ludäscher
2018 International Journal of Digital Curation  
Runtime observables can be linked to prospective provenance via relational views and queries.  ...  Users provide prospective provenance, i.e., the conceptual workflows latent in scripts, via simple YesWorkflow annotations, embedded as script comments.  ...  doi:10.2218/ijdc.v12i2.585 YW-NW bridge approach can not only enable queries that can be answered by YW, but also answer queries for runtime data information that can't be answered solely by YW.  ... 
doi:10.2218/ijdc.v12i2.585 fatcat:a77rphjemjfi5di4broibpe2pu

Applying the Virtual Data Provenance Model [chapter]

Yong Zhao, Michael Wilde, Ian Foster
2006 Lecture Notes in Computer Science  
To support such analyses, we have developed a "virtual data system" that allows users first to define, then to invoke, and finally explore the provenance of procedures (and workflows comprising multiple  ...  We provide here an overview of this integration, the queries and transformations that it enables, and examples of how these capabilities can serve scientific processes.  ...  [BKT01] distinguish between why-provenance and where-provenance.  ... 
doi:10.1007/11890850_16 fatcat:b3kbldaowvgebojqhqujfgsemm

Data Provenance for Sport [article]

Andrew J. Simmons, Scott Barnett, Simon Vajda, Rajesh Vasa
2018 arXiv   pre-print
Our findings suggest that one-size-fits-all provenance and workflow systems are a poor fit in practice, and that their notation and functionality need to be optimised for the domain of use.  ...  Standards for representing data provenance (i.e. the origins of the data), such as the W3C PROV standard, can assist with this process, however require a mapping between abstract provenance concepts and  ...  Although the black box nature of workflows prevents support of "why" provenance and "where" provenance methods designed for analysing provenance of SQL query results, we noted that workflows implicitly  ... 
arXiv:1812.05804v1 fatcat:b6ekgafktndxbl6mcnt4ooop6u

A language for provenance access control

Tyrone Cadenhead, Vaibhav Khadilkar, Murat Kantarcioglu, Bhavani Thuraisingham
2011 Proceedings of the first ACM conference on Data and application security and privacy - CODASPY '11  
Provenance is a directed acyclic graph that explains how a resource came to be in its current form. Traditional access control does not support provenance graphs.  ...  In this paper, we propose a language that complements and extends existing access control languages to support provenance. This language also provides access to data based on integrity criteria.  ...  This would evaluate the following regular expression query on the provenance graph: Query Templates We can use the set of names in VG to answer common queries about provenance such as why-provenance  ... 
doi:10.1145/1943513.1943532 dblp:conf/codaspy/CadenheadKKT11 fatcat:ahl2dokrcbg5tovukxomtjkyj4

End-to-End provenance representation for the understandability and reproducibility of scientific experiments using a semantic approach

Sheeba Samuel, Birgitta König-Ries
2022 Journal of Biomedical Semantics  
The ontology is evaluated by answering competency questions over the knowledge base of scientific experiments consisting of computational and non-computational data and steps.  ...  Results We present the "REPRODUCE-ME" data model and ontology to describe the end-to-end provenance of scientific experiments by extending existing standards in the semantic web.  ...  Based on our understanding of scientific practices in the first step of our study, we identified that there are many experimental workflows that do not depend or require such complex scientific workflow  ... 
doi:10.1186/s13326-021-00253-1 pmid:34991705 pmcid:PMC8734275 fatcat:lwc57qbra5bcxiadcrimbufezi

ProvDB: Provenance-enabled Lifecycle Management of Collaborative Data Analysis Workflows

Hui Miao, Amol Deshpande
2018 IEEE Data Engineering Bulletin  
workflows.  ...  captures a large amount of fine-grained information about the analysis processes and versioned data artifacts in a semi-passive manner using a flexible and extensible ingestion mechanism; provides novel querying  ...  Unlike retrospective query facilities in scientific workflow provenance systems [20] , their processes are predefined in workflow skeletons, and multiple executions generate different instance-level provenance  ... 
dblp:journals/debu/0001D18 fatcat:ybx7j6hvnjanbjnmz7wyrmb2te

Computing Location-Based Lineage from Workflow Specifications to Optimize Provenance Queries [chapter]

Saumen Dey, Sven Köhler, Shawn Bowers, Bertram Ludäscher
2015 Lecture Notes in Computer Science  
We present a location-based approach for executing provenance lineage queries that significantly reduces query execution cost without incurring additional storage costs.  ...  The key idea of our approach is to exploit the fact that provenance graphs resemble the workflow graphs that generated them and that many workflow computation models assume workflow steps have statically  ...  Many applications of provenance within these systems rely on being able to easily pose and efficiently answer lineage queries, which for data-intensive workflows require evaluation techniques that are  ... 
doi:10.1007/978-3-319-16462-5_14 fatcat:ratfnv3ymfehronswy2v6vvfui

Using provenance to manage knowledge of In Silico experiments

R. Stevens, J. Zhao, C. Goble
2007 Briefings in Bioinformatics  
Recording of the provenance of an experimentçwhat was done; where, how and why, etc. is an important aspect of scientific best practice that should be extended to in silico experimentation.  ...  In reviewing provenance support, we will review one of the important knowledge management issues in bioinformatics.  ...  In the first, each system had to run a workflow involved in brain imaging and then answer a set of queries.  ... 
doi:10.1093/bib/bbm015 pmid:17502335 fatcat:folt4biql5db7c5plbppukrrli

The Foundations for Provenance on the Web

Luc Moreau
2010 Foundations and Trends® in Web Science  
By that, it is meant that workflow or database management systems are in full control of the data they manage, and track their provenance within their own scope, but not beyond.  ...  Given that the majority of work on provenance has been undertaken by the database, workflow and e-science communities, some of their work is reviewed, contrasting approaches, and focusing on important  ...  you to Chaomei Chen for his help with CiteSpace, to Danius Michaelides for his help with scripts for processing bibliographical data, and Ewa Deelman, Paul Groth and Simon Miles for providing feedback on  ... 
doi:10.1561/1800000010 fatcat:zggat6oky5egripxkunwkkelqm
« Previous Showing results 1 — 15 out of 3,721 results