Entity Resolution with crowd errors

Vasilis Verroios, Hector Garcia-Molina
2015 2015 IEEE 31st International Conference on Data Engineering  
Given a set of records, an ER algorithm finds records that refer to the same real-world entity. Humans can often determine if two records refer to the same entity, and hence we study the problem of selecting questions to ask errorprone humans. We give a Maximum Likelihood formulation for the problem of finding the "most beneficial" questions to ask next. Our theoretical results lead to a lightweight and practical algorithm, bDENSE, for selecting questions to ask humans. Our experimental results
more » ... xperimental results show that bDENSE can more quickly reach an accurate outcome, compared to two approaches proposed recently. Moreover, through our experimental evaluation, we identify the strengths and weaknesses of all three approaches.
doi:10.1109/icde.2015.7113286 dblp:conf/icde/VerroiosG15 fatcat:og5jjpyuubahpejw434z7tjosi