Web-Scale Blocking, Iterative and Progressive Entity Resolution
2017 IEEE 33rd International Conference on Data Engineering (ICDE)
Entity resolution aims to identify descriptions of the same entity within or across knowledge bases. In this work, we provide a comprehensive and cohesive overview of the key research results in the area of entity resolution. We are interested in frameworks addressing the new challenges in entity resolution posed by the Web of data in which real world entities are described by interlinked data rather than documents. Since such descriptions are usually partial, overlapping and sometimes
... entity resolution emerges as a central problem both to increase dataset linking, but also to search the Web of data for entities and their relations. We focus on Web-scale blocking, iterative and progressive solutions for entity resolution. Specifically, to reduce the required number of comparisons, blocking is performed to place similar descriptions into blocks and executes comparisons to identify matches only between descriptions within the same block. To minimize the number of missed matches, an iterative entity resolution process can exploit any intermediate results of blocking and matching, discovering new candidate description pairs for resolution. Finally, we overview works on progressive entity resolution, which attempt to discover as many matches as possible given limited computing budget, by estimating the matching likelihood of yet unresolved descriptions, based on the matches found so far.