A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2015; you can also visit the original URL.
The file type is application/pdf
.
Word-level Language Identification using CRF: Code-switching Shared Task Report of MSR India System
2014
Proceedings of the First Workshop on Computational Approaches to Code Switching
We describe a CRF based system for word-level language identification of code-mixed text. Our method uses lexical, contextual, character n-gram, and special character features, and therefore, can easily be replicated across languages. Its performance is benchmarked against the test sets provided by the shared task on code-mixing (Solorio et al., 2014) for four language pairs, namely, English-Spanish (En-Es), English-Nepali (En-Ne), English-Mandarin (En-Cn), and Standard Arabic-Arabic (Ar-Ar)
doi:10.3115/v1/w14-3908
dblp:conf/acl-codeswitch/ChittaranjanVBC14
fatcat:5gdztcs27nh5xhlywsqyakhdzu