129,837 Hits in 3.4 sec


Mohamed Yakout, Ahmed K. Elmagarmid, Jennifer Neville, Mourad Ouzzani
2010 Proceedings of the 2010 international conference on Management of data - SIGMOD '10  
GDR quantifies the data quality benefit of generated repairs by combining mechanisms from decision theory and active learning.  ...  We present a demo of GDR, a Guided Data Repair system that uses a novel approach to efficiently involve the user alongside automatic data repair techniques to reach better data quality as quickly as possible  ...  A general and recent domain independent approach for improving data quality is to (i) discover and identify some data quality rules (DQRs), and then, (ii) use these rules to derive data repairs for dirty  ... 
doi:10.1145/1807167.1807325 dblp:conf/sigmod/YakoutENO10 fatcat:lca4caxuxzd3bgr4jpfurk3api


Michele Dallachiesa, Amr Ebaid, Ahmed Eldawy, Ahmed Elmagarmid, Ihab F. Ilyas, Mourad Ouzzani, Nan Tang
2013 Proceedings of the 2013 international conference on Management of data - SIGMOD '13  
The programming interface allows the users to specify multiple types of data quality rules, which uniformly define what is wrong with the data and (possibly) how to repair it through writing code that  ...  We show that the programming interface can be used to express many types of data quality rules beyond the well known CFDs (FDs), MDs and ETL rules.  ...  We can see that (1) when taken together, different data quality rules help each other, and (2) to make practical use of their interaction, repairing operations for various types of data quality rules should  ... 
doi:10.1145/2463676.2465327 dblp:conf/sigmod/DallachiesaEEEIOT13 fatcat:ckbqrdcehjdptd2rt3zn6eiw7y

Guided data repair

Mohamed Yakout, Ahmed K. Elmagarmid, Jennifer Neville, Mourad Ouzzani, Ihab F. Ilyas
2011 Proceedings of the VLDB Endowment  
We empirically evaluate GDR on a real-world dataset and show significant improvement in data quality using our user guided repairing process.  ...  We also, assess the trade-off between the user efforts and the resulting data quality.  ...  We also consider a set of data quality rules Σ that represent data integrity semantics. In this paper, we consider rules in the form of CFDs.  ... 
doi:10.14778/1952376.1952378 fatcat:zvm3hh47mvcllhdeomtqqtfrku

An Association Rules-Based Method for Outliers Cleaning of Measurement Data in the Distribution Network

Hua Kuang, Risheng Qin, Mi He, Xin He, Ruimin Duan, Cheng Guo, Xian Meng
2021 Frontiers in Energy Research  
The method is based on a set of association rules (AR) that are automatically generated form historical measurement data.  ...  In order to improve the data quality, the outliers cleaning method for measurement data in the distribution network is studied in this paper.  ...  AUTHOR CONTRIBUTIONS Conception and design of study: HK; Acquisition of data: MH, XH; Drafting the article: RQ, XH; Analysis and interpretation of data: HK, RQ, CG, and XM; Revising the article critically  ... 
doi:10.3389/fenrg.2021.730058 fatcat:hpdeiidhjnezpj4ygfitpvlone

Data Cleaning

Xu Chu, Ihab F. Ilyas, Sanjay Krishnan, Jiannan Wang
2016 Proceedings of the 2016 International Conference on Management of Data - SIGMOD '16  
Detecting and repairing dirty data is one of the perennial challenges in data analytics, and failure to do so can result in inaccurate analytics and unreliable decisions.  ...  To better understand the new advances in the field, we will first present a taxonomy of the data cleaning literature in which we highlight the recent interest in techniques that use constraints, rules,  ...  , and taking user feedback in discovering of data quality rules, is yet to be explored.  ... 
doi:10.1145/2882903.2912574 dblp:conf/sigmod/ChuIKW16 fatcat:4htyrvwp3fafjgvrchb3rwjtym

Autonomous recovery from hostile code insertion using distributed reflection

Catriona M Kennedy, Aaron Sloman
2003 Cognitive Systems Research  
Other components monitor "quality" of performance in the application domain.  ...  Some reflective (or "meta-level") components enable the system to monitor its execution traces and detect anomalies by comparing them with a model of normal activity.  ...  7 = suppression of hostile code; 8 = data recovery request; 9 = repair request; 10 = data recovery; 11 = repair. 1 = external sensors; 2 = external effectors; 3 = pattern− and quality−monitoring  ... 
doi:10.1016/s1389-0417(02)00096-7 fatcat:3ajh52kjtfgxdjculbuh6ntbmq

Extraction of Missing Tendency Using Decision Tree Learning in Business Process Event Log

Hiroki Horita, Yuta Kurihashi, Nozomi Miyamori
2020 Data  
We conducted experiments using data from the incident management system and confirmed the effectiveness of our method.  ...  The event log may contain missing data due to technical or human error, and if the data are missing, the analysis results will be inadequate.  ...  Acknowledgments: We would like to thank the members of our laboratory for discussing this study. Conflicts of Interest: The authors declare no conflict of interest.  ... 
doi:10.3390/data5030082 fatcat:q7hswyier5c35pva5dgprcccxe

A Review of Data Cleaning Methods for Web Information System

Jinlin Wang, Xing Wang, Yuchen Yang, Hongli Zhang, Binxing Fang
2019 Computers Materials & Continua  
Then, after elaborating and analyzing each category, we summarize the descriptions and challenges of data cleaning methods with sub-elements such as data & user interaction, data quality rule, model, crowdsourcing  ...  Data cleaning plays an essential role in various WIS scenarios to improve the quality of data service. In this paper, we present a review of the state-of-the-art methods for data cleaning in WIS.  ...  outperforms previous algorithms in terms of quality and efficiency of the repair [Dallachiesa, Ebaid, Eldawy et al. (2013)] (1) Allow the users to specify multiple types of data quality rules (2) Allow  ... 
doi:10.32604/cmc.2020.08675 fatcat:jusi6zu7rzg65po5sowrpxlwxm

BART in Action

Donatello Santoro, Patricia C. Arocena, Boris Glavic, Giansalvatore Mecca, Renée J. Miller, Paolo Papotti
2016 Proceedings of the 2016 International Conference on Management of Data - SIGMOD '16  
Repairing erroneous or conflicting data that violate a set of constraints is an important problem in data management.  ...  Finally, we concretely put to work five data-repairing algorithms on dirty data of various kinds generated using Bart, and discuss their performance.  ...  Denial constraints are a very expressive language, capable of capturing most data-quality rules used for data-repairing, including FDs, CFDs, cleaning equalitygenerating dependencies, editing rules, fixing  ... 
doi:10.1145/2882903.2899397 dblp:conf/sigmod/SantoroAGMMP16 fatcat:wpd2kur3angfvmjf4sgzfzshze

Qualitative data cleaning

Xu Chu, Ihab F. Ilyas
2016 Proceedings of the VLDB Endowment  
Data quality is one of the most important problems in data management, since dirty data often leads to inaccurate data analytics results and wrong business decisions.  ...  Data cleaning exercise often consist of two phases: error detection and error repairing.  ...  I of schema R and a set of data quality requirements expressed in a variety of ways, data repairing refers to the process of finding another database instance I that conforms to the set of data quality  ... 
doi:10.14778/3007263.3007320 fatcat:5tnfp3bhqffdbpvqgjabp7ctoq

That's all folks!

Floris Geerts, Giansalvatore Mecca, Paolo Papotti, Donatello Santoro
2014 Proceedings of the VLDB Endowment  
Unfortunately, schema-mappings and data quality rules interact with each other, so that applying existing algorithms in a pipelined way -i.e., first exchange then data, then repair the result -does not  ...  techniques to repair the data.  ...  It is natural to think of data exchange and data repairing as two strongly interrelated activities.  ... 
doi:10.14778/2733004.2733031 fatcat:r6nxerxanbeihp7updvzv3ks3q


Theodoros Rekatsinas, Xu Chu, Ihab F. Ilyas, Christopher Ré
2017 Proceedings of the VLDB Endowment  
HoloClean unifies qualitative data repairing, which relies on integrity constraints or external data sources, with quantitative data repairing methods, which leverage statistical properties of the input  ...  We show that HoloClean finds data repairs with an average precision of ∼ 90% and an average recall of above ∼ 76% across a diverse array of datasets exhibiting different types of errors.  ...  The authors would like to thank the members of the Hazy Group for their feedback and help.  ... 
doi:10.14778/3137628.3137631 fatcat:rdkkfljgdrgcjkpiggkol3mu7m

Interactive and Deterministic Data Cleaning

Jian He, Enzo Veltri, Donatello Santoro, Guoliang Li, Giansalvatore Mecca, Paolo Papotti, Nan Tang
2016 Proceedings of the 2016 International Conference on Management of Data - SIGMOD '16  
Falcon does not rely on the existence of a set of pre-defined data quality rules. On the contrary, it encourages users to explore the data, identify possible problems, and make updates to fix them.  ...  Bootstrapped by one user update, Falcon guesses a set of possible sql update queries that can be used to repair the data.  ...  GDR ("Guided Data Repairs") is a recently proposed algorithm that relies on active learn-ing in order to improve the quality of repairs.  ... 
doi:10.1145/2882903.2915242 dblp:conf/sigmod/HeVSLMPT16 fatcat:ob7xk77gofgc5mynsfjkvhzx5i

A revival of integrity constraints for data cleaning

Wenfei Fan, Floris Geerts, Xibei Jia
2008 Proceedings of the VLDB Endowment  
Integrity constraints, a.k.a. data dependencies, are being widely used for improving the quality of schema. Recently constraints have enjoyed a revival for improving the quality of data.  ...  The tutorial aims to provide an overview of recent advances in constraint-based data cleaning.  ...  Open Problems and Emerging Applications The study of constraint-based data cleaning has raised as many questions as it has answered. References  ... 
doi:10.14778/1454159.1454220 fatcat:zomyj7tafraezjhul6zr45wpsu

Generic and Declarative Approaches to Data Quality Management [chapter]

Leopoldo Bertossi, Loreto Bravo
2013 Handbook of Data Quality  
Data quality assessment and data cleaning tasks have traditionally been addressed through procedural solutions.  ...  In the last few years we have seen the emergence of more generic solutions; and also of declarative and rule-based specifications of the intended solutions of data cleaning processes.  ...  We are grateful to our research collaborators with whom part of the research described here has been carried out.  ... 
doi:10.1007/978-3-642-36257-6_9 fatcat:corj5f7yunfizby2ouib3duu5m
« Previous Showing results 1 — 15 out of 129,837 results