Multiword Expression Identification with Tree Substitution Grammars: A Parsing tour de force with French

Spence Green, Marie-Catherine de Marneffe, John Bauer, Christopher D. Manning
2011 Conference on Empirical Methods in Natural Language Processing  
Multiword expressions (MWE), a known nuisance for both linguistics and NLP, blur the lines between syntax and semantics. Previous work on MWE identification has relied primarily on surface statistics, which perform poorly for longer MWEs and cannot model discontinuous expressions. To address these problems, we show that even the simplest parsing models can effectively identify MWEs of arbitrary length, and that Tree Substitution Grammars achieve the best results. Our experiments show a 36.4% F1
more » ... absolute improvement for French over an n-gram surface statistics baseline, currently the predominant method for MWE identification. Our models are useful for several NLP tasks in which MWE pre-grouping has improved accuracy.
dblp:conf/emnlp/GreenMBM11 fatcat:f73vxwdd3nekjn6y2xebskqu4m