Blocking and Filtering Techniques for Entity Resolution

George Papadakis, Dimitrios Skoutas, Emmanouil Thanos, Themis Palpanas
2020 ACM Computing Surveys  
Entity Resolution (ER), a core task of Data Integration, detects different entity profiles that correspond to the same real-world object. Due to its inherently quadratic complexity, a series of techniques accelerate it so that it scales to voluminous data. In this survey, we review a large number of relevant works under two different but related frameworks: Blocking and Filtering. The former restricts comparisons to entity pairs that are more likely to match, while the latter identifies quickly
more » ... entity pairs that are likely to satisfy predetermined similarity thresholds. We also elaborate on hybrid approaches that combine different characteristics. For each framework we provide a comprehensive list of the relevant works, discussing them in the greater context. We conclude with the most promising directions for future work in the field.
doi:10.1145/3377455 fatcat:uuzuuxwwzrfg7cwfwzswdqvklm