Finding Terminology Translations from Non-parallel Corpora

Pascale Fung
1997 Workshop on Very Large Corpora  
We present a statistical word feature, the Word Relation Matrix, which can be used to find translated pairs of words and terms from non-parallel corpora, across language groups. Online dictionary entries are used as seed words to generate Word Relation Matrices for the unknown words according to correlation measures. Word Relation Matrices are then mapped across the corpora to find translation pairs. Translation accuracies are around 30% when only the top candidate is counted. Nevertheless, top
more » ... 20 candidate output give a 50.9% average increase in accuracy on human translator performance.
dblp:conf/acl-vlc/Fung97 fatcat:mp3kvqfpdrg37pr5vrd2blxhgy