Computing Location-Based Lineage from Workflow Specifications to Optimize Provenance Queries [chapter]

Saumen Dey, Sven Köhler, Shawn Bowers, Bertram Ludäscher
2015 Lecture Notes in Computer Science  
We present a location-based approach for executing provenance lineage queries that significantly reduces query execution cost without incurring additional storage costs. The key idea of our approach is to exploit the fact that provenance graphs resemble the workflow graphs that generated them and that many workflow computation models assume workflow steps have statically defined data consumptionproduction (i.e., data input-output) rates. We describe a new lineage computation technique that uses
more » ... the structure of workflow specifications together with consumption-production rates to pre-compute (i.e., to forecast) the access paths of all dependent data items prior to workflow execution. We also present experimental results showing that our approach can significantly out perform traditional data lineage query techniques.
doi:10.1007/978-3-319-16462-5_14 fatcat:ratfnv3ymfehronswy2v6vvfui