NTHU at NTCIR-10 CrossLink-2: An Approach toward Semantic Features

Yu-Lan Liu, Joanne Boisson, Jason S. Chang
2013 NTCIR Conference on Evaluation of Information Access Technologies  
This paper describes the approaches of NTHU in the NTCIR-10 Cross-Lingual Link Discovery task, also named CrossLink-2. In this task, we aim to discover valuable anchors in Chinese, Japanese or Korean (CJK) articles and to link these anchors to related English Wikipedia pages. To achieve the objective, we do not only depend on Wikipedia's distinguishing features (e.g. anchor links information and language links) but also developed a method that analyzes the semantic features of anchor texts in
more » ... inese Wikipedia. In the linking phase, a Latent Dirichlet Allocation model (LDA) is used for the computation of a text similarity measure among the English Wikipedia articles. This novel approach to address the word-to-links ambiguity issue shows encouraging result in the CrossLink-2 evaluation.
dblp:conf/ntcir/LiuBC13 fatcat:ro62au3jmrgdpf76hhbe2mople