Word-level Language Identification using CRF: Code-switching Shared Task Report of MSR India System

Gokul Chittaranjan, Yogarshi Vyas, Kalika Bali, Monojit Choudhury
2014 Proceedings of the First Workshop on Computational Approaches to Code Switching  
We describe a CRF based system for word-level language identification of code-mixed text. Our method uses lexical, contextual, character n-gram, and special character features, and therefore, can easily be replicated across languages. Its performance is benchmarked against the test sets provided by the shared task on code-mixing (Solorio et al., 2014) for four language pairs, namely, English-Spanish (En-Es), English-Nepali (En-Ne), English-Mandarin (En-Cn), and Standard Arabic-Arabic (Ar-Ar)
more » ... lects. The experimental results show a consistent performance across the language pairs.
doi:10.3115/v1/w14-3908 dblp:conf/acl-codeswitch/ChittaranjanVBC14 fatcat:5gdztcs27nh5xhlywsqyakhdzu