Cross Language Text Categorization Using a Bilingual Lexicon

Ke Wu, Xiaolin Wang, Bao-Liang Lu
2008 International Joint Conference on Natural Language Processing  
With the popularity of the Internet at a phenomenal rate, an ever-increasing number of documents in languages other than English are available in the Internet. Cross language text categorization has attracted more and more attention for the organization of these heterogeneous document collections. In this paper, we focus on how to conduct effective cross language text categorization. To this end, we propose a cross language naive Bayes algorithm. The preliminary experiments on collected
more » ... collections show the effectiveness of the proposed method and verify the feasibility of achieving performance close to monolingual text categorization, using a bilingual lexicon alone. Also, our algorithm is more efficient than our baselines.
dblp:conf/ijcnlp/WuWL08 fatcat:zb34hrh2cbflznxdaoce7jfsca