Query Translation for Cross-Language Information Retrieval using Multilingual Word Clusters

Paheli Bhattacharya, Pawan Goyal, Sudeshna Sarkar
2016 Workshop on South and Southeast Asian NLP  
In Cross-Language Information Retrieval, finding the appropriate translation of the source language query has always been a difficult problem to solve. We propose a technique towards solving this problem with the help of multilingual word clusters obtained from multilingual word embeddings. We use word embeddings of the languages projected to a common vector space on which a community-detection algorithm is applied to find clusters such that words that represent the same concept from different
more » ... anguages fall in the same group. We utilize these multilingual word clusters to perform query translation for Cross-Language Information Retrieval for three languages -English, Hindi and Bengali. We have experimented with the FIRE 2012 and Wikipedia datasets and have shown improvements over several standard methods like dictionarybased method, a transliteration-based model and Google Translate.
dblp:conf/wssanlp/BhattacharyaGS16 fatcat:iw4moar74ngabmmawprv7uoh64