A unified approach for schema matching, coreference and canonicalization

Michael L. Wick, Khashayar Rohanimanesh, Karl Schultz, Andrew McCallum
2008 Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD 08  
The automatic consolidation of database records from many heterogeneous sources into a single repository requires solving several information integration tasks. Although tasks such as coreference, schema matching, and canonicalization are closely related, they are most commonly studied in isolation. Systems that do tackle multiple integration problems traditionally solve each independently, allowing errors to propagate from one task to another. In this paper, we describe a
more » ... ed model that reasons about schema matching, coreference, and canonicalization jointly. We evaluate our model on a real-world data set of people and demonstrate that simultaneously solving these tasks reduces errors over a cascaded or isolated approach. Our experiments show that a joint model is able to improve substantially over systems that either solve each task in isolation or with the conventional cascade. We demonstrate nearly a 50% error reduction for coreference and a 40% error reduction for schema matching.
doi:10.1145/1401890.1401977 dblp:conf/kdd/WickRSM08 fatcat:supejrgzyfbqfphd3br6nnb3ey