Overview of the Cross-Domain Authorship Verification Task at PAN 2020

Mike Kestemont, Enrique Manjavacas, Ilia Markov, Janek Bevendorff, Matti Wiegmann, Efstathios Stamatatos, Martin Potthast, Benno Stein
2020 Conference and Labs of the Evaluation Forum  
Authorship identification remains a highly topical research problem in computational text analysis with many relevant applications in contemporary society and industry. For this edition of PAN, we focused on authorship verification, where the task is to assess whether a pair of documents has been authored by the same individual. Like in previous editions, we continued to work with (English-language) fanfiction, written by non-professional authors. As a novelty, we substantially increased the
more » ... e of the provided dataset to enable more datahungry approaches. In total, thirteen systems (from ten participating teams) have been submitted, which are substantially more diverse than the submissions from previous years. We provide a detailed comparison of these approaches and two generic baselines. Our findings suggest that the increased scale of the training data boosts the state of the art in the field, but we also confirm the conventional issue that the field struggles with an overreliance on topic-related information. Introduction From the very beginning, authorship analysis tasks have played a key role within the PAN series. A variety of shared tasks have been developed over the past decade, complemented by the much-needed development of benchmark corpora for problems such as authorship attribution, authorship clustering, and authorship verification -both within and across genres, and within and across languages. Rather than adding new task variants (or repeating existing ones), we decided this year to renew our mission and broaden our perspective, by organizing an annual series of tasks of a gradually increasing difficulty and realism, organized within a three-year strategy (2020)(2021)(2022)(2023). In this endeavour, we also aim to integrate as many of the lessons learned from recent editions as possible. Amongst others, we aim to devote explicit care to some of the larger challenges that remain open in the field, such as author-topic orthogonality, cross-genre
dblp:conf/clef/KestemontMMBWSP20 fatcat:a4ihisqt7zbypm6guu4ttvui4y