Bilingual Word Embeddings for Bilingual Terminology Extraction from Specialized Comparable Corpora

Amir Hazem, Emmanuel Morin
2017 International Joint Conference on Natural Language Processing  
Bilingual lexicon extraction from comparable corpora is constrained by the small amount of available data when dealing with specialized domains. This aspect penalizes the performance of distributionalbased approaches, which is closely related to the reliability of word's cooccurrence counts extracted from comparable corpora. A solution to avoid this limitation is to associate external resources with the comparable corpus. Since bilingual word embeddings have recently shown efficient models for
more » ... earning bilingual distributed representation of words, we explore different word embedding models and show how a general-domain comparable corpus can enrich a specialized comparable corpus via neural networks.
dblp:conf/ijcnlp/HazemM17 fatcat:zvvoz7fexrhq5gulraqecddoau