UniNE at CLEF 2016: Author Clustering

Mirco Kocher
2016 Conference and Labs of the Evaluation Forum  
This paper describes and evaluates an effective unsupervised author clustering authorship linking model called SPATIUM-L1. The suggested strategy can be adapted without any problem to different languages (such as Dutch, English, and Greek) in different genres (e.g., newspaper articles and reviews). As features, we suggest using the m most frequent terms of each text (isolated words and punctuation symbols with m at most 200). Applying a simple distance measure, we determine whether there is
more » ... gh indication that two texts were written by the same author. The evaluations are based on six test collections (PAN AUTHOR CLUSTERING task at CLEF 2016).
dblp:conf/clef/Kocher16 fatcat:gjzq7mxtgzgafedwnthqguwrve