Filters








28,582 Hits in 2.0 sec

A unified model for data and constraint repair

Fei Chiang, Renee J. Miller
2011 2011 IEEE 27th International Conference on Data Engineering  
In this work, we present a novel unified cost model that allows data and constraint repairs to be compared on an equal footing.  ...  In such settings, when an inconsistency occurs, it is no longer clear if there is an error in the data (and the data should be repaired), or if the constraints have evolved (and the constraints should  ...  A UNIFIED REPAIR MODEL One of the contributions of our work is a new cost model that quantifies the trade-off of when an inconsistency is a true data error (warranting a data repair) vs. an update to the  ... 
doi:10.1109/icde.2011.5767833 dblp:conf/icde/ChiangM11 fatcat:ei5xigeypzb45i2w66n32ua6e4

Continuous data cleaning

Maksims Volkovs, Fei Chiang, Jaroslaw Szlichta, Renee J. Miller
2014 2014 IEEE 30th International Conference on Data Engineering  
Recently, unified approaches that repair both errors in data and errors in semantics (the constraints) have been proposed.  ...  However, both data-only approaches and unified approaches are by and large static in that they apply cleaning to a single snapshot of the data and constraints.  ...  Recently, unified approaches that repair both errors in data and errors in semantics (the constraints) have been proposed.  ... 
doi:10.1109/icde.2014.6816655 dblp:conf/icde/VolkovsCSM14 fatcat:ozssye5kcndr3bamm7mjwio37q

Constraint-Variance Tolerant Data Repairing

Shaoxu Song, Han Zhu, Jianmin Wang
2016 Proceedings of the 2016 International Conference on Management of Data - SIGMOD '16  
To address the oversimplified and overrefined constraint inaccuracies, in this paper, we propose to repair data by allowing a small variation (with both predicate insertion and deletion) on the constraints  ...  Results on real data sets demonstrate that our proposal can capture more accurate data repairs compared to the existing methods with/without constraint repairs.  ...  Acknowledgement This work is supported in part by the Tsinghua University Initiative Scientific Research Program; Tsinghua National Laboratory Special Fund for Big Data Science and Technology; China NSFC  ... 
doi:10.1145/2882903.2882955 dblp:conf/sigmod/SongZW16 fatcat:3bv55leilfbalcdw7bmg27mdze

HoloClean: Holistic Data Repairs with Probabilistic Inference [article]

Theodoros Rekatsinas, Xu Chu, Ihab F. Ilyas, Christopher Ré
2017 arXiv   pre-print
HoloClean unifies existing qualitative data repairing approaches, which rely on integrity constraints or external data sources, with quantitative data repairing methods, which leverage statistical properties  ...  We show that HoloClean scales to instances with millions of tuples and find data repairs with an average precision of ~90% and an average recall of above ~76% across a diverse array of datasets exhibiting  ...  We start with Holistic, which relies only on logical constraints and performs repairs to individual cells iteratively until no constraints are violated.  ... 
arXiv:1702.00820v1 fatcat:a2a5iroro5e2beqslb5oyn4p64

HoloClean

Theodoros Rekatsinas, Xu Chu, Ihab F. Ilyas, Christopher Ré
2017 Proceedings of the VLDB Endowment  
HoloClean unifies qualitative data repairing, which relies on integrity constraints or external data sources, with quantitative data repairing methods, which leverage statistical properties of the input  ...  We show that HoloClean finds data repairs with an average precision of ∼ 90% and an average recall of above ∼ 76% across a diverse array of datasets exhibiting different types of errors.  ...  -15-C-4043), and XDATA (FA8750-12-2-0335) programs, and the Office of Naval Research (N000141210041 and N000141310129).  ... 
doi:10.14778/3137628.3137631 fatcat:rdkkfljgdrgcjkpiggkol3mu7m

That's all folks!

Floris Geerts, Giansalvatore Mecca, Paolo Papotti, Donatello Santoro
2014 Proceedings of the VLDB Endowment  
LLUNATIC is based on the intuition that transforming and cleaning data are different facets of the same problem, unified by their declarative nature.  ...  techniques to repair the data.  ...  and the repairing of data quality constraints does not terminate.  ... 
doi:10.14778/2733004.2733031 fatcat:r6nxerxanbeihp7updvzv3ks3q

Coherent Integration of Databases by Abductive Logic Programming

O. Arieli, M. Denecker, B. Van Nuffelen, M. Bruynooghe
2004 The Journal of Artificial Intelligence Research  
The outcome is an abductive-based application that is sound and complete with respect to a corresponding model-based, preferential semantics, and -- to the best of our knowledge -- is more expressive (  ...  Abstract: We introduce an abductive method for a coherent integration of independent data-sources.  ...  Acknowledgements We would like to thank the anonymous reviewers for many helpful comments and suggestions. This research was supported by the Research Fund K.U.Leuven and by FWO-Vlaanderen.  ... 
doi:10.1613/jair.1322 fatcat:vdiqbqpymnafjdexvspcpmteti

Repairing Inconsistent Databases: A Model-Theoretic Approach and Abductive Reasoning [article]

Ofer Arieli, Maurice Bruynooghe The Catholic University of Leuven, Belgium)
2002 arXiv   pre-print
The two approaches for coherent data integration are related by soundness and completeness results.  ...  In this paper we consider two points of views to the problem of coherent integration of distributed data. First we give a pure model-theoretic analysis of the possible ways to 'repair' a database.  ...  In particular, facts that are specified in a particular database may violate some integrity constraints defined elsewhere, and so it might contradict some elements in the unified set of integrity constraints  ... 
arXiv:cs/0207085v1 fatcat:xzrzqrcbjnctdfu57l5oj6g5cm

Combining quantitative and logical data cleaning

Nataliya Prokoshyna, Jaroslaw Szlichta, Fei Chiang, Renée J. Miller, Divesh Srivastava
2015 Proceedings of the VLDB Endowment  
Quantitative data cleaning relies on the use of statistical methods to identify and repair data quality problems while logical data cleaning tackles the same problems using various forms of logical reasoning  ...  are considered to be a data quality problem, and (ii) repairs that modify the inconsistent data so as to minimize statistical distortion, measured using the Earth Mover's Distance.  ...  Comparative Study We compare our algorithm against another algorithm, the Unified Repair Model by Chiang and Miller [8] , that performs data repairs as well as constraint repairs, but for FDs, not metric  ... 
doi:10.14778/2856318.2856325 fatcat:rfn7nt7nnjhchk25ibpifcoetq

Consistent query answering via ASP from different perspectives: Theory and practice

MARCO MANNA, FRANCESCO RICCA, GIORGIO TERRACINA
2012 Theory and Practice of Logic Programming  
AbstractA data integration system provides transparent access to different data sources by suitably combining their data, and providing the user with a unified view of them, calledglobal schema.  ...  However, source data are generally not under the control of the data integration process; thus, integrated data may violate global integrity constraints even in the presence of locally consistent data  ...  Roughly speaking, a data integration system provides transparent access to different data sources by suitably combining their data, and providing the user with a unified view of them, called global schema  ... 
doi:10.1017/s1471068411000640 fatcat:mg6hh3fbp5horo622nq5w6xzmm

Error-Tolerant Agents [chapter]

Thomas Eiter, Viviana Mascardi, V. S. Subrahmanian
2002 Lecture Notes in Computer Science  
or interactions with other agents that are unaffected by repairs. ¢ ¡ ¤ £ ¦ ¥ § © data structures and repair actions which are to be used by the recovery component.  ...  More importantly, in our framework, agents take "repair" actions automatically when confronted with such situations, but while taking such repair actions, they can often continue to engage in work and/  ...  This work was supported in part by the Austrian Science A Appendix: Feasible, Rational, and Reasonable Status Sets This appendix provides in succinct form the definition of various concepts of status  ... 
doi:10.1007/3-540-45628-7_22 fatcat:nz32yvxuzzf33ll26pzgduofaq

A Hybrid Data Cleaning Framework using Markov Logic Networks [article]

Yunjun Gao, Congcong Ge, Xiaoye Miao, Haobo Wang, Bin Yao, Qing Li
2019 arXiv   pre-print
MLNClean mainly consists of two cleaning stages, namely, first cleaning multiple data versions separately (each of which corresponds to one data rule), and then deriving the final clean data based on multiple  ...  With the increase of dirty data, data cleaning turns into a crux of data analysis.  ...  Thereafter, the state-of-the-art method HoloClean [22] unifies several data repair signals including integrity constraints and external data to construct a knowledge-base probabilistic graphical model  ... 
arXiv:1903.05826v1 fatcat:dz3rwkqjqbfk7h3nd247n2tt6e

Data Quality Problems beyond Consistency and Deduplication [chapter]

Wenfei Fan, Floris Geerts, Shuai Ma, Nan Tang, Wenyuan Yu
2013 Lecture Notes in Computer Science  
Recent work on data quality has primarily focused on data repairing algorithms for improving data consistency and record matching methods for data deduplication.  ...  This paper accentuates several other challenging issues that are essential to developing data cleaning systems, namely, error correction with performance guarantees, unification of data repairing and record  ...  Unifying matching and repairing, we state the data cleaning problem as follows.  ... 
doi:10.1007/978-3-642-41660-6_12 fatcat:kfbpssruungy5cy67k4b7wvwmi

Qualitative data cleaning

Xu Chu, Ihab F. Ilyas
2016 Proceedings of the VLDB Endowment  
Data cleaning exercise often consist of two phases: error detection and error repairing.  ...  Error detection techniques can either be quantitative or qualitative; and error repairing is performed by applying data transformation scripts or by involving human experts, and sometimes both.  ...  the data and the constraints [4] .  ... 
doi:10.14778/3007263.3007320 fatcat:5tnfp3bhqffdbpvqgjabp7ctoq

Toward unification of taxonomy databases in a distributed computer environment

H Kitakami, Y Tateno, T Gojobori
1994 Proceedings. International Conference on Intelligent Systems for Molecular Biology  
A repair system is needed to remove inconsistencies in each data bank and mismatches among data banks.  ...  The goal of the present study is to unify the existent taxonomy databases and eliminate inconsistencies (errors) that are present in them.  ...  Thanks also to Rainer Fuchs and David Hazledine of EMBL Data Library for their help in automatically obtaining the EMBL-taxonomy database with the relational model.  ... 
pmid:7584395 fatcat:a33cbvxkcneljkjvemxvgsk2hq
« Previous Showing results 1 — 15 out of 28,582 results