Distributed data provenance for large-scale data-intensive computing

Dongfang Zhao, Chen Shou, Tanu Maliky, Ioan Raicu
2013 2013 IEEE International Conference on Cluster Computing (CLUSTER)  
It has become increasingly important to capture and understand the origins and derivation of data (its provenance). A key issue in evaluating the feasibility of data provenance is its performance, overheads, and scalability. In this paper, we explore the feasibility of a general metadata storage and management layer for parallel file systems, in which metadata includes both file operations and provenance metadata. We experimentally investi
doi:10.1109/cluster.2013.6702685 dblp:conf/cluster/ZhaoSMR13 fatcat:yxktbz7zu5gf7m2umseien36de