A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is application/pdf
.
Adding data provenance support to Apache Spark
2017
The VLDB journal
Debugging data processing logic in data-intensive scalable computing (DISC) systems is a difficult and time-consuming effort. Today's DISC systems offer very little tooling for debugging programs, and as a result, programmers spend countless hours collecting evidence (e.g., from log files) and performing trial-and-error debugging. To aid this effort, we built Titian, a library that enables data provenance-tracking data through transformations-in Apache Spark. Data scientists using the Titian
doi:10.1007/s00778-017-0474-5
pmid:31007500
pmcid:PMC6474385
fatcat:ppvk5na66zdjrgpilsl7cprsdq