Knowledge-Based Representation for Transductive Multilingual Document Classification [chapter]

Salvatore Romeo, Dino Ienco, Andrea Tagarelli
2015 Lecture Notes in Computer Science  
Multilingual document classification is often addressed by approaches that rely on language-specific resources (e.g., bilingual dictionaries and machine translation tools) to evaluate cross-lingual document similarities. However, the required transformations may alter the original document semantics, raising additional issues to the known difficulty of obtaining high-quality labeled datasets. To overcome such issues we propose a new framework for multilingual document classification under a
more » ... sductive learning setting. We exploit a large-scale multilingual knowledge base, BabelNet, to support the modeling of different language-written documents into a common conceptual space, without requiring any language translation process. We resort to a state-of-theart transductive learner to produce the document classification. Results on two real-world multilingual corpora have highlighted the effectiveness of the proposed document model w.r.t. document representations usually involved in multilingual and cross-lingual analysis, and the robustness of the transductive setting for multilingual document classification.
doi:10.1007/978-3-319-16354-3_11 fatcat:2ysczrdjdjg4fpxnfxivsj5yjq