Cross parser evaluation and tagset variation

Djamé Seddah, Marie Candito, Benoît Crabbé
2009 Proceedings of the 11th International Conference on Parsing Technologies - IWPT '09   unpublished
This paper presents preliminary investigations on the statistical parsing of French by bringing a complete evaluation on French data of the main probabilistic lexicalized and unlexicalized parsers first designed on the Penn Treebank. We adapted the parsers on the two existing treebanks of French (Abeillé et al., 2003; Schluter and van Genabith, 2007) . To our knowledge, mostly all of the results reported here are state-of-the-art for the constituent parsing of French on every available
more » ... available treebank. Regarding the algorithms, the comparisons show that lexicalized parsing models are outperformed by the unlexicalized Berkeley parser. Regarding the treebanks, we observe that, depending on the parsing model, a tag set with specific features has direct influence over evaluation results. We show that the adapted lexicalized parsers do not share the same sensitivity towards the amount of lexical material used for training, thus questioning the relevance of using only one lexicalized model to study the usefulness of lexicalization for the parsing of French.
doi:10.3115/1697236.1697266 fatcat:mkoavmid4va4vahgk7eyl3meuu