35 Hits in 5.6 sec

A primer on provenance

Lucian Carata, Sherif Akoush, Nikilesh Balakrishnan, Thomas Bytheway, Ripduman Sohan, Margo Selter, Andy Hopper
2014 Communications of the ACM  
ASSESSING THE QUALITY or validity of a piece of data is not usually done in isolation.  ...  You typically examine the context in which the data appears and try to determine its original sources or review the process through which it was created.  ...  Acknowledgments We would like to thank George Coulouris for his feedback and our reviewers for their constructive comments and suggestions.  ... 
doi:10.1145/2596628 fatcat:votcaprhhfe25laiohnm7tc7de

A Primer on Provenance

Lucian Carata, Sherif Akoush, Nikilesh Balakrishnan, Thomas Bytheway, Ripduman Sohan, Margo Seltzer, Andy Hopper
2014 Queue  
Assessing the quality or validity of a piece of data is not usually done in isolation.  ...  You typically examine the context in which the data appears and try to determine its original sources or review the process through which it was created.  ...  ACKNOWLEDGMENTS We would like to thank George Coulouris for his feedback and support in shaping this article toward its current form, and our reviewers for their constructive comments and suggestions.  ... 
doi:10.1145/2602649.2602651 fatcat:p2swocpz2vhjfbffuch4oseexu

A survey of simulation provenance systems: modeling, capturing, querying, visualization, and advanced utilization

Young-Kyoon Suh, Ki Yong Lee
2018 Human-Centric Computing and Information Sciences  
In this manuscript we provide a comprehensive survey of a wide range of existing systems to utilize provenance data produced by simulation.  ...  In particular, we present a taxonomy of scientific platforms regarding provenance support and holistically tabulate the major functionalities and supporting levels of the studied systems.  ...  Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.  ... 
doi:10.1186/s13673-018-0150-9 fatcat:zmdmunmguvfelnlwcvnhw5wmpi

Big Provenance Stream Processing for Data Intensive Computations

Isuru Suriarachchi, Sachith Withana, Beth Plale
2018 2018 IEEE 14th International Conference on e-Science (e-Science)  
Big Provenance in DICs Fine-grained provenance captured from DICs is useful for debugging and monitoring computations, for tracing the origins of derived data, and for tracing the derivation paths for  ...  and Spark. Wang J. et al. [104] [34] present a way of capturing provenance in MapReduce workflows by integrating Hadoop into Kepler and using provenance capabilities of Kepler.  ...  of the WSO2 Application Server (May 2010 to July 2012), a Java runtime for hosting Web Services and Web Applications. • Implemented lazy loading for tenants and services in WSO2 Stratos (Platform as a  ... 
doi:10.1109/escience.2018.00039 dblp:conf/eScience/SuriarachchiWP18 fatcat:xrkgn66wkzhyxm2z6duattyqdy

Logical provenance in data-oriented workflows?

R. Ikeda, Akash Das Sarma, J. Widom
2013 2013 IEEE 29th International Conference on Data Engineering (ICDE)  
in data-oriented workflows, as well as debugging and drill-down using logical provenance Overall, our work provides a comprehensive foundation, set of algorithms, and prototype system for provenance in  ...  We then: 1) Describe a wrapper-based approach for capturing provenance in workflows in which all transformations are either map or reduce functions 2) Describe a provenance-based approach for selectively  ...  RAMP System We have built a system called RAMP (Reduce And Map Provenance) for capturing and tracing provenance in GMRWs.  ... 
doi:10.1109/icde.2013.6544882 dblp:conf/icde/IkedaSW13 fatcat:vmwxwcwzfjhnjmyfomce6l5m2y

Scalable lineage capture for debugging DISC analytics

Dionysios Logothetis, Soumyarupa De, Kenneth Yocum
2013 Proceedings of the 4th annual Symposium on Cloud Computing - SOCC '13  
This paper presents Newt, a scalable architecture for capturing and using record-level data lineage to discover and resolve errors in analytics.  ...  As case studies, we instrument two DISC systems, Hadoop and Hyracks, with less than 105 lines of additional code for each.  ...  Vicky Papavasileiou and Zhaomo Yang assisted in considering how to apply Newt to graph processing systems, and Chris Olston provided valuable feedback on the research direction and early drafts.  ... 
doi:10.1145/2523616.2523619 dblp:conf/cloud/LogothetisDY13 fatcat:26nbpa2ubvgh3ko5kotqic3jrm


Muhammad Ali Gulzar, Matteo Interlandi, Seunghyun Yoo, Sai Deep Tetali, Tyson Condie, Todd Millstein, Miryung Kim
2016 Proceedings of the 38th International Conference on Software Engineering - ICSE '16  
Our evaluation shows that BIGDEBUG scales to terabytes and its record-level tracing incurs less than 25% overhead on average.  ...  To address this challenge, we design a set of interactive, real-time debugging primitives for big data processing in Apache Spark, the next generation data-intensive scalable cloud computing platform.  ...  We would also like to thank our industry partners at IBM and Intel for their gifts.  ... 
doi:10.1145/2884781.2884813 pmid:27390389 pmcid:PMC4933307 dblp:conf/icse/GulzarIYTCMK16 fatcat:atfa4b4cczehrkslaojivhkosi

Interactive and automated debugging for big data analytics

Muhammad Ali Gulzar
2018 Proceedings of the 40th International Conference on Software Engineering Companion Proceeedings - ICSE '18  
We seek to address these challenges with the development of BIGDEBUG, a framework providing interactive debugging primitives and tool-assisted fault localization services for big data analytics.  ...  We showcase the data provenance and optimized incremental computation features to effectively and efficiently support interactive debugging, and investigate new research directions on how to automatically  ...  Current approaches to supporting data provenance in DISC systems (specifically RAMP [13] and Newt [18] ) cannot support interactive debugging.  ... 
doi:10.1145/3183440.3190334 dblp:conf/icse/Gulzar18 fatcat:o36lxubmjzfqxmv6p2kfkufkia

Distributed Ledger for Provenance Tracking of Artificial Intelligence Assets [article]

Philipp Lüthi, Thibault Gagnaux, Marcel Gygli
2020 arXiv   pre-print
Provenance tracing systems are a possible measure to build trust by improving transparency.  ...  In this paper we design a graph-based provenance model for AI assets and their relations within an AI value chain. Moreover, we propose a protocol to exchange AI assets securely to selected parties.  ...  This work is supported by the Swiss State Secretariat for Education, Research and Innovation (SERI) under contract numbers 16.0159.  ... 
arXiv:2002.11000v1 fatcat:ys5buaglybeqbkm4mweof3mmnm

Explaining outputs in modern data analytics

Zaheer Chothia, John Liagouris, Frank McSherry, Timothy Roscoe
2016 Proceedings of the VLDB Endowment  
This choice allows our implementation to inherit the performance characteristics of differential dataflow, and results in a system that efficiently computes and updates explanatory inputs even as the inputs  ...  Second, we provide a general method for identifying explanations that are sufficient to reproduce the target output in arbitrary computations -a problem for which no viable solution existed until now.  ...  , Newt, and RAMP.  ... 
doi:10.14778/2994509.2994530 fatcat:ummh2n3gyjby5fjnjutcepdqmy

SubZero: A fine-grained lineage system for scientific databases

E. Wu, S. Madden, M. Stonebraker
2013 2013 IEEE 29th International Conference on Data Engineering (ICDE)  
Data lineage is a key component of provenance that helps scientists track and query relationships between input and output data.  ...  We use the insights to define lineage representations that efficiently capture common locality properties in the lineage data, and a set of APIs so operator developers can easily export lineage information  ...  RAMP [12] extends MapReduce to automatically generate lineage capturing wrappers around Map and Reduce operators.  ... 
doi:10.1109/icde.2013.6544881 dblp:conf/icde/0002MS13 fatcat:bxh4jmnzmfhvdk5qqqebmqkoy4

Tracing Distributed Data Stream Processing Systems

Zoltan Zvara, Peter G.N. Szabo, Gabor Hermann, Andras Benczur
2017 2017 IEEE 2nd International Workshops on Foundations and Applications of Self* Systems (FAS*W)  
Existing tracing solutions are only suitable for single-system batch workloads, and solely provide debugging capabilities in most cases.  ...  We present a distributed, platform-wide tracing design and framework for production streaming applications that helps to solve a variety of optimization problems in real-time.  ...  RAMP [17] wraps map and reduce functions in Hadoop to achieve backward and forward tracing.  ... 
doi:10.1109/fas-w.2017.153 dblp:conf/saso/ZvaraSHB17 fatcat:sacald7cenekxjmvjwll6umpse

29th International Conference on Data Engineering [book of abstracts]

2013 2013 IEEE 29th International Conference on Data Engineering Workshops (ICDEW)  
We provide an algorithm for provenance tracing in workflows where logical provenance for each transformation is specified using our language.  ...  Logical Provenance in Data-Oriented Workflows Robert Ikeda, Akash Das Sarma, Jennifer Widom (Stanford University) We consider the problem of defining, generating, and tracing provenance in data-oriented  ...  These volunteers welcome participants, give directions, help in the sessions and on the registration desk, and generally make sure the conference is running smoothly.  ... 
doi:10.1109/icdew.2013.6547409 fatcat:wadzpuh3b5htli4mgb4jreoika

List of Contributors [chapter]

Shibakali Gupta, Indradip Banerjee, Siddhartha Bhattacharyya
2019 Big Data Security  
Big data workflows need to discover modeling and capturing provenance information by a small number of key learning.  ...  RAMP (reduce and map provenance) is an augmentation to Hadoop that underpins provenance catch and following for MapReduce work processes ().  ...  The proper analysis of static and streaming datasets in big data technology can assure to develop applications on medical and other scientific relevance and thus there can be a huge scope of business opportunities  ... 
doi:10.1515/9783110606058-202 fatcat:3jtqdtgsavas7n3vxrtbrkdbdy

Cloud-based design and manufacturing: A new paradigm in digital manufacturing and design innovation

Dazhong Wu, David W. Rosen, Lihui Wang, Dirk Schaefer
2015 Computer-Aided Design  
in a holistic sense.  ...  a systematic requirements checklist that an idealized CBDM system should satisfy, and compare CBDM to other relevant but more traditional collaborative design and distributed manufacturing systems such  ...  Highlight: We present a new paradigm in digital manufacturing and design innovation, namely cloud-based design and manufacturing (CBDM). We identify the common key characteristics of CBDM.  ... 
doi:10.1016/j.cad.2014.07.006 fatcat:rqtymdaxbrbjpnbqv6cgoa3y6u
« Previous Showing results 1 — 15 out of 35 results