Cross-Lingual Bridges with Models of Lexical Borrowing

Yulia Tsvetkov, Chris Dyer
2016 The Journal of Artificial Intelligence Research  
Linguistic borrowing is the phenomenon of transferring linguistic constructions (lexical, phonological, morphological, and syntactic) from a "donor" language to a "recipient" language as a result of contacts between communities speaking different languages. Borrowed words are found in all languages, and—in contrast to cognate relationships—borrowing relationships may exist across unrelated languages (for example, about 40% of Swahili's vocabulary is borrowed from the unrelated language Arabic).
more » ... In this work, we develop a model of morpho-phonological transformations across languages. Its features are based on universal constraints from Optimality Theory (OT), and we show that compared to several standard—but linguistically more naïve—baselines, our OT-inspired model obtains good performance at predicting donor forms from borrowed forms with only a few dozen training examples, making this a cost-effective strategy for sharing lexical information across languages. We demonstrate applications of the lexical borrowing model in machine translation, using resource-rich donor language to obtain translations of out-of-vocabulary loanwords in a lower resource language. Our framework obtains substantial improvements (up to 1.6 BLEU) over standard baselines.
doi:10.1613/jair.4786 fatcat:yxlrfqmbdrdrrgsc3uom6mvv24