Aligning words in French-English non-parallel medical texts: effect of term frequency distributions

Yun-Chuang Chiao, Pierre Zweigenbaum
2004 Studies in Health Technology and Informatics  
In this paper, we present a method for aligning words based on a statistical model of word distribution similarity. The basis underlying our method is that there is a correlation between the patterns of word co-occurrences in texts of different languages. Using automatically downloaded pages from different medical web sites and a combined bilingual lexicon of general and medical terms as language sources, a similarity score is assigned to each proposed translated pair of words, based on the
more » ... ributional contexts of these two words. We vary several parameters of the method. Experimental results confirm a positive effect of frequency, show that medical words are better handled than less specialized words, and do not evidence a clear influence of context window size. Future directions for improvement include working with very large, part-of-speech tagged corpora.
pmid:15360767 fatcat:mcgtnnodvjdczjnpqvw6lbupuy