Knowledge Transfer across Multilingual Corpora via Latent Topics [chapter]

Wim De Smet, Jie Tang, Marie-Francine Moens
2011 Lecture Notes in Computer Science  
This paper explores bridging the content of two different languages via latent topics. Specifically, we propose a unified probabilistic model to simultaneously model latent topics from bilingual corpora that discuss comparable content and use the topics as features in a cross-lingual, dictionary-less text categorization ask. Experimental results on multilingual Wikipedia data show that the proposed topic model effectively discover the topic information from the bilingual corpora, and the
more » ... topics successfully transfer classification knowledge to other languages, for which no labeled training data are available.
doi:10.1007/978-3-642-20841-6_45 fatcat:75nuuftzqbaprmiz3n3giusdwi