A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2016; you can also visit the original URL.
The file type is application/pdf
.
Schema-agnostic vs schema-based configurations for blocking methods on homogeneous data
2015
Proceedings of the VLDB Endowment
Entity Resolution constitutes a core task for data integration that, due to its quadratic complexity, typically scales to large datasets through blocking methods. These can be configured in two ways. The schema-based configuration relies on schema information in order to select signatures of high distinctiveness and low noise, while the schema-agnostic one treats every token from all attribute values as a signature. The latter approach has significant potential, as it requires no fine-tuning by
doi:10.14778/2856318.2856326
fatcat:rfuywydydbbmlf6xfe72ug4w44