Crowd-Sourced Entity Resolution with Control Queries

Sainyam Galhotra, Donatella Firmani, Barna Saha, Divesh Srivastava
2019 Sistemi Evoluti per Basi di Dati  
Entity resolution (ER) seeks to identify which records in a data set refer to the same real-world entity. Given the diversity of ways in which entities can be represented, ER is known to be a challenging task for automated strategies, but relatively easier for expert humans. Nonetheless, also humans can make mistakes. Our contribution is an error correction toolkit that can be leveraged by a variety of hybrid human-machine ER algorithms, based on a formal way for selecting "control queries" for
more » ... the human experts. We demonstrate empirically that less recent ER algorithms equipped with our tool can perform even better than most recent ER methods with built-in error correction.
dblp:conf/sebd/GalhotraFSS19 fatcat:7punhdtphnalvi4ym5n6unylhi