A Templating System to Generate Provenance

Luc Moreau, Belfrit Victor Batlajery, Trung Dong Huynh, Danius Michaelides, Heather Packer
2018 IEEE Transactions on Software Engineering  
This work is licensed under a Creative Commons Attribution 3.0 License. For more information, see Abstract-PROV-TEMPLATE is a declarative approach that enables designers and programmers to design and generate provenance compatible with the PROV standard of the World Wide Web Consortium. Designers specify the topology of the provenance to be generated by composing templates, which are provenance graphs containing variables, acting as placeholders for values. Programmers write programs that log
more » ... lues and package them up in sets of bindings, a data structure associating variables and values. An expansion algorithm generates instantiated provenance from templates and sets of bindings in any of the serialisation formats supported by PROV. A quantitative evaluation shows that sets of bindings have a size that is typically 40% of that of expanded provenance templates and that the expansion algorithm is suitably tractable, operating in fractions of milliseconds for the type of templates surveyed in the article. Furthermore, the approach shows four significant software engineering benefits: separation of responsibilities, provenance maintenance, potential runtime checks and static analysis, and provenance consumption. The article gathers quantitative data and qualitative benefits descriptions from four different applications making use of PROV-TEMPLATE. The system is implemented and released in the open-source library ProvToolbox for provenance processing. ! 1 INTRODUCTION P ROVENANCE has gained a lot of traction lately in various areas including the Web, legal notices 1 , climate science 2 , scientific workflows [1], [2], [3], computational reproducibility [4], emergency response [5] , medical applications 3 , geospatial domain 4 , art and food. The recent standard PROV [6] of the World Wide Web Consortium defines provenance as "as a record that describes the people, institutions, entities, and activities involved in producing, influencing, or delivering a piece of data or a thing." In an increasing number of applications, provenance has become crucial in making systems accountable, by exposing how information flows in systems, and in helping users decide whether information is to be trusted. Provenance is not restricted to computer systems, it can also be used to describe how objects are transformed and people are involved in a physical system [5] . Applications and use cases for provenance are well documented in the literature [7], [8], [9], [10]. They include making systems more auditable and accountable [11], reproducing results [12], deriving trust and classification [13], asserting attribution and generating acknowledgments [14], supporting predictive analytics [13], and facilitating traceability [15] . To enable such a powerful functionality, however, one needs to adapt or write applications, so that they generate provenance information, which can then be exploited to offer new benefits to their users. A number of approaches have been proposed to generate provenance: run-time, compile-time, and retrospectively. Runtime generation typically requires applications to be instrumented, and provenance generated accordingly [16], • The authors are with the
doi:10.1109/tse.2017.2659745 fatcat:ywn4lr67irgqnngxpy5pi6boji