An efficient method for determining bilingual word classes

Franz Josef Och
1999 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics -   unpublished
In statistical natural language processing we always face the problem of sparse data. One way to reduce this problem is to group words into equivalence classes which is a standard method in statistical language modeling. In this paper we describe a method to determine bilingual word classes suitable for statistical machine translation. We develop an optimization criterion based on a maximumlikelihood approach and describe a clustering algorithm. We will show that the usage of the bilingual word
more » ... the bilingual word classes we get can improve statistical machine translation.
doi:10.3115/977035.977046 fatcat:kdsdpzhv5zffvhngbfxp6deolu