Parser Training with Heterogeneous Treebanks [article]

Sara Stymne, Miryam de Lhoneux, Aaron Smith, Joakim Nivre
2018 arXiv   pre-print
How to make the most of multiple heterogeneous treebanks when training a monolingual dependency parser is an open question. We start by investigating previously suggested, but little evaluated, strategies for exploiting multiple treebanks based on concatenating training sets, with or without fine-tuning. We go on to propose a new method based on treebank embeddings. We perform experiments for several languages and show that in many cases fine-tuning and treebank embeddings lead to substantial
more » ... provements over single treebanks or concatenation, with average gains of 2.0--3.5 LAS points. We argue that treebank embeddings should be preferred due to their conceptual simplicity, flexibility and extensibility.
arXiv:1805.05089v1 fatcat:az3m3257cjh2bchzpjzrzeuyfe