Evaluation of Finite State Morphological Analyzers Based on Paradigm Extraction from Wiktionary

Ling Liu, Mans Hulden
2017 Proceedings of the 13th International Conference on Finite State Methods and Natural Language Processing (FSMNLP 2017)  
Similar languages have a large number of cognate words which can be exploited to deal with Out-Of-Vocabulary (OOV) words problem. This problem is especially severe for resource-scarce languages. We propose a method for 'word transduction' for addressing this problem. We take advantage of the fact that, although it is difficult to prepare sentence aligned parallel corpus for such languages, it is much easier to prepare 'parallel' list of word pairs which are cognates and have similar
more » ... ns. We can try to learn pronunciations (or orthographic representations) of OOV words from such a parallel list. This could be done by using phrase-based machine translation (PBMT). We show that, for small amount of data, a model based on weighted rewrite rules for phoneme chunks outperforms a PBMT-based approach. An additional point that we make is that word transduction can also be used to borrow words from another similar language and adapt them to the phonology of the target language. 1
doi:10.18653/v1/w17-4009 dblp:conf/fsmnlp/SharmaS17 fatcat:tykwl4gmhbdvtix5tu65ogpzcy