A primer on provenance

Lucian Carata, Sherif Akoush, Nikilesh Balakrishnan, Thomas Bytheway, Ripduman Sohan, Margo Selter, Andy Hopper
2014 Communications of the ACM  
ASSESSING THE QUALITY or validity of a piece of data is not usually done in isolation. You typically examine the context in which the data appears and try to determine its original sources or review the process through which it was created. This is not so straightforward when dealing with digital data, however: the result of a computation might have been derived from numerous sources and by applying complex successive transformations, possibly over long periods of time. As the quantity of data
more » ... hat contributes to a particular result increases, keeping track of how different sources and transformations are related to each other becomes more difficult. This constrains the ability to answer questions regarding a result's history, such as: What were the underlying assumptions on which the result is based? Under what conditions does it remain valid? What other results were derived from the same data sources?
doi:10.1145/2596628 fatcat:votcaprhhfe25laiohnm7tc7de