Q-Data: Using Deductive Database Technology to Improve Data Quality [chapter]

Amit Sheth, Christopher Wood, Vipul Kashyap
1995 Applications of Logic Databases  
This chapter discusses an extended deductive database prototype system, Q-Data, developed by Bellcore to improve data quality through data validation and cleanup. The key technology component of Q-Data is the extended deductive database system LDL++, developed at MCC. We discuss the issues of data quality improvement, the relevance of the deductive database technology such as the LDL++ system to data quality i m p r o vement tasks, and the system architecture of the prototype. Furthermore, we
more » ... scribe our experiences using the deductive database technology in an on-going Q-Data trial attacking a real-world problem with test data from operational systems. Experiences related to engineering aspects of both the deductive database system and other component t e c hnologies, as well as pragmatic aspects of the implementation of Q-Data as a distributed system, are discussed. 2 Chapter 1 quality has received little attention in database literature 8]. A signi cant percentage of data in most companies are of poor quality 1 2 ]. The important dimensions of data quality include accuracy or correctness, completeness, consistency, and currentness 5]. Examples of poor data quality include errors in input data (e.g., a partial or nonexistent address), data inconsistencies (e.g., di erent customer billing addresses for the same customer or incorrect Zip code for the location), and unintended duplication or redundancy (e.g., multiple customer records because of di erent representations of the same customer such as DEC, Digital Equip. Corp., and Digital Equipment Corporation) | often contributed by duplicate or redundant data produced by di erent processes and organizations. Poor data quality is a result of a variety of factors, including awed data acquisition and data creation processes, awed data update processes, inability t o e n f o r c e constraints among related data in multiple databases 7], duplicate data produced by di erent methods, organizations and processes, process re-engineering and company reorganizations. Two of the most frequent manifestations of poor data quality a r e :
doi:10.1007/978-1-4615-2207-2_2 fatcat:lnjxoqocrbc7rnuxgczwdngx3i