A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is
Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics -
In statistical natural language processing we always face the problem of sparse data. One way to reduce this problem is to group words into equivalence classes which is a standard method in statistical language modeling. In this paper we describe a method to determine bilingual word classes suitable for statistical machine translation. We develop an optimization criterion based on a maximumlikelihood approach and describe a clustering algorithm. We will show that the usage of the bilingual worddoi:10.3115/977035.977046 fatcat:kdsdpzhv5zffvhngbfxp6deolu