Qualitative data cleaning

Xu Chu, Ihab F. Ilyas
2016 Proceedings of the VLDB Endowment  
Data quality is one of the most important problems in data management, since dirty data often leads to inaccurate data analytics results and wrong business decisions. Data cleaning exercise often consist of two phases: error detection and error repairing. Error detection techniques can either be quantitative or qualitative; and error repairing is performed by applying data transformation scripts or by involving human experts, and sometimes both. In this tutorial, we discuss the main facets and
more » ... irections in designing qualitative data cleaning techniques. We present a taxonomy of current qualitative error detection techniques, as well as a taxonomy of current data repairing techniques. We will also discuss proposals for tackling the challenges for cleaning "big data" in terms of scale and distribution.
doi:10.14778/3007263.3007320 fatcat:5tnfp3bhqffdbpvqgjabp7ctoq