Cross-document Coreference for WePS

Iustin Dornescu, Constantin Orasan, Tatiana Lesnikova
2010 Conference and Labs of the Evaluation Forum  
A good clustering performance depends on the quality of the distance function used to asses similarity. In this paper we propose a pairwise document coreference model to improve performance over a wordvector similarity approach for the WePS 3 clustering task. We identify a simple criterion which discriminates between highly ambiguous queries, i.e. many small clusters, and balanced queries, i.e. fewer, larger clusters. A document clustering framework was developed facilitating direct comparison
more » ... etween different parameters, features and algorithms. It uses a unified feature representation to afford a wide variety of clustering pipelines. Using the predicted coreference likelihood and a simple clustering algorithm, we achieve comparable results on the WePS 2 dataset, and competitive performance on the WePS 3 dataset.
dblp:conf/clef/DornescuOL10 fatcat:xjpmzk6bgbhvdnq2lelcuurdbe