Learning Morphosyntactic Analyzers from the Bible via Iterative Annotation Projection across 26 Languages

Garrett Nicolai, David Yarowsky
2019 Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics  
A large percentage of computational tools are concentrated in a very small subset of the planet's languages. Compounding the issue, many languages lack the high-quality linguistic annotation necessary for the construction of such tools with current machine learning methods. In this paper, we address both issues simultaneously: leveraging the high accuracy of English taggers and parsers, we project morphological information onto translations of the Bible in 26 varied test languages. Using an
more » ... uages. Using an iterative discovery, constraint, and training process, we build inflectional lexica in the target languages. Through a combination of iteration, ensembling, and reranking, we see double-digit relative error reductions in lemmatization and morphological analysis over a strong initial system.
doi:10.18653/v1/p19-1172 dblp:conf/acl/NicolaiY19 fatcat:yb4bxtyrxfe45e4ltmt2vmizbu