On the evaluation and comparison of taggers

Lluís Padró, Lluís Màrquez
1998 Proceedings of the 36th annual meeting on Association for Computational Linguistics -  
This paper addresses the issue of POS tagger evaluation. Such evaluation is usually performed by comparing the tagger output with a reference test corpus, which is assumed to be error-free. Currently used corpora contain noise which causes the obtained performance to be a distortion of the real value. We analyze to what extent this distortion may invalidate the comparison between taggers or the measure of the improvement given by a new system. The main conclusion is that a more rigorous testing
more » ... re rigorous testing experimentation setting/designing is needed to reliably evaluate and compare tagger accuracies.
doi:10.3115/980691.980733 dblp:conf/acl/PadroM98 fatcat:e5dslpwkrzgqpit7woylmverya