Learning an accurate entity resolution model from crowdsourced labels

Jingjing Wang, Satoshi Oyama, Masahito Kurihara, Hisashi Kashima
2014 Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication - ICUIMC '14  
We investigated the use of supervised learning methods that use labels from crowd workers to resolve entities. Although obtaining labeled data by crowdsourcing can reduce time and cost, it also brings challenges (e.g., coping with the variable quality of crowdgenerated data). First, we evaluated the quality of crowd-generated labels for actual entity resolution data sets. Then, we evaluated the prediction accuracy of two machine learning methods that use labels from crowd workers: a
more » ... LPP method using consensus labels obtained by majority voting and our proposed method that combines multiple Laplacians directly by using crowdsourced data. We discussed the relationship between the accuracy of workers' labels and the prediction accuracy of the two methods.
doi:10.1145/2557977.2558060 dblp:conf/icuimc/WangOKK14 fatcat:2hwx52fnfrcqlbge7dldtx767e