A corpus of K'iche' annotated for morphosyntactic structure

Francis Tyers, Robert Henderson
2021 Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas   unpublished
This article describes a collection of sentences in K'iche' annotated for morphology and syntax. K'iche' is a language in the Mayan language family, spoken in Guatemala. The annotation is done according to the guidelines of the Universal Dependencies project. The corpus consists of a total of 1,433 sentences containing approximately 10,000 tokens and is released under a free/open-source licence. We present a comparison of parsing systems for K'iche' using this corpus and describe how it can be used for mining linguistic examples.
doi:10.18653/v1/2021.americasnlp-1.2 fatcat:eyumtxu5kbdpdgcbrojopxonsa