Cross-lingual text classification with model translation and document translation

Teng-Sheng Moh, Zhang Zhang
2012 Proceedings of the 50th Annual Southeast Regional Conference on - ACM-SE '12  
Most enterprise search engines employ data mining classifiers to classify documents. Along with the economic globalization, many companies are starting to have overseas branches or divisions. Those branches are using local languages in documents and emails. When a classifier tries to categorize those documents in another language, the trained model in mono-lingual will not work. The most direct solution would be to translate those documents in other languages into one language by the machine
more » ... nslator. But this solution suffers from inaccuracy of the machine translation, and the over-head work is economically inefficient. Another approach is to translate the feature extracted from one language to another language and use them to classify another language. This approach is efficient but faces a translation inaccuracy and language culture gap. In this project, the author proposes a new method which adapts both the model translation and document translation. This method can take advantage of the very best functionality between both the document translation and model translation methods.
doi:10.1145/2184512.2184530 dblp:conf/ACMse/MohZ12 fatcat:fx7qigosrzg4rdadtzre4p4azq