Large-Scale Evaluation of Keyphrase Extraction Models

Ygor Gallina, Florian Boudin, Béatrice Daille
2020 Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020  
Keyphrase extraction models are usually evaluated under different, not directly comparable, experimental setups. As a result, it remains unclear how well proposed models actually perform, and how they compare to each other. In this work, we address this issue by presenting a systematic large-scale analysis of state-ofthe-art keyphrase extraction models involving multiple benchmark datasets from various sources and domains. Our main results reveal that state-of-the-art models are in fact still
more » ... allenged by simple baselines on some datasets. We also present new insights about the impact of using author-or reader-assigned keyphrases as a proxy for gold standard, and give recommendations for strong baselines and reliable benchmark datasets.
doi:10.1145/3383583.3398517 dblp:conf/jcdl/GallinaBD20 fatcat:bjtfeghnxvfcjp4fvve5wtcegm