Mixed Attributes Two-Stage-Clustering Entity Resolution

LEI Gang
2015 Journal of Communication and Computer  
Record matching and clustering are two essential steps in the process of entity resolution, and the single text similarity clustering based on tf-idf (term frequency-inverse document frequency) feature often leads to poor precision in spots entity resolution. The paper outlines a mixed attributes two-stage-clustering entity resolution framework (abbreviated in MATC-ER) and designs an approach to measure the similarity by mixing spot name and spot introduction, which makes good use of the record
more » ... d use of the record information at different stages. Then the paper proves its efficiency based on the comparative experiments on the real data of travel spots.
doi:10.17265/1548-7709/2015.06.003 fatcat:re3zxvde2zanhkfilfaawulihq