Grammatical Disambiguation in the Tatar National Corpus

Bulat Khakimov, Ramil Gataullin, Rinat Gilmullin
unpublished
This paper concerns the issues of grammatical ambiguity in the Tatar National Corpus and the possiblities for automation of the disambiguation process in the corpus. Grammatical ambiguity is widely represented in agglutinative languages like Turkic or Finno-Ugric. In order to build the grammatically disambiguated subcorpus, wе have developed a special software module which searches for ambiguous tokens in the corpus, collects statistical information and allows creating and implementing the
more » ... l disambiguation rules for different ambiguity types. Disambiguation in the corpus is based on the context-oriented classification of ambiguity types which has been carried out on statistical corpus data in the Tatar language for the first time. We can say that we use the corpus as a source of our research and at the same time as a destination for implementing the results. Estimated cumulative effect of disambiguation of the identified frequent ambiguity types in the Tatar National Corpus can be up to 50%.
doi:10.29007/jkgl fatcat:xvf2e735bbhc3p6tz6cejx3rda