OPQL: Querying scientific workflow provenance at the graph level

Chunhyeok Lim, Shiyong Lu, Artem Chebotko, Farshad Fotouhi, Andrey Kashlev
2013 Data & Knowledge Engineering  
Provenance has become increasingly important in scientific workflows to understand, verify, and reproduce the result of scientific data analysis. Most existing systems store provenance data in provenance stores with proprietary provenance data models and conduct query processing over the physical provenance storages using query languages, such as SQL, SPARQL, and XQuery, which are closely coupled to the underlying storage strategies. Querying provenance at such low level leads to poor usability
more » ... of the system: a user needs to know the underlying schema to formulate queries; if the schema changes, queries need to be reformulated; and queries formulated for one system will not run in another system. In this paper, we present OPQL, a provenance query language that enables the querying of provenance directly at the graph level. An OPQL query takes a provenance graph as input and produces another provenance graph as output. Therefore, OPQL queries are not tightly coupled to the underlying provenance storage strategies. Our main contributions are: (i) we design OPQL, including six types of graph patterns, a provenance graph algebra, and OPQL syntax and semantics, that supports querying provenance at the graph level; (ii) we implement OPQL using a Web service via our OPMPROV system; therefore, users can invoke the Web service to execute OPQL queries in a provenance browser, called OPMPROVIS. The result of OPQL queries is displayed as a provenance graph in OPMPROVIS. An experimental study is conducted to evaluate the feasibility and performance of OPMPROV on OPQL provenance querying.
doi:10.1016/j.datak.2013.08.008 fatcat:jhxpmfqtefcalh3prsqyrmuuvu