Multilingual Text Classification

Prof. Praveen Dhyani, Sonam Mittal
2015 International Journal of Engineering Research and  
Identifying the language used for a document will typically be the first step to most of the Natural Language Processing tasks. Among the wide variety of language identification methods discussed in the literature, the ones employing the Canvar and Trenkle (1994) approach to text categorization based on character n-gram frequencies have been particularly successful. Multilingual Text Classification using Ngram techniques seems to have produced very interesting results in the field of text
more » ... rization not only for the languages like English and French but equally good for more difficult to classify languages like Spanish, Italian, German and Russian.
doi:10.17577/ijertv4is030032 fatcat:ehcyezsm7rep3bmsutnfe6afbm