Multilingual Projection for Parsing Truly Low-Resource Languages

Željko Agić, Anders Johannsen, Barbara Plank, Héctor Martínez Alonso, Natalie Schluter, Anders Søgaard
2016 Transactions of the Association for Computational Linguistics  
We propose a novel approach to cross-lingual part-of-speech tagging and dependency parsing for truly low-resource languages. Our annotation projection-based approach yields tagging and parsing models for over 100 languages. All that is needed are freely available parallel texts, and taggers and parsers for resource-rich languages. The empirical evaluation across 30 test languages shows that our method consistently provides top-level accuracies, close to established upper bounds, and outperforms several competitive baselines.
doi:10.1162/tacl_a_00100 fatcat:zulajoluuzci3foa5zzmv6w45y