Bilingual corpus cleaning focusing on translation literality

Kenji Imamura, Eiichiro Sumita
2002 7th International Conference on Spoken Language Processing (ICSLP 2002)   unpublished
When we automatically acquire translation knowledge from a bilingual corpus, redundant rules are generated due to translation variety. To overcome this problem, we propose bilingual corpus cleaning based on translation literality. Word-level correspondence and phrase-level correspondence are applied as the criteria of literality. Using these criteria, a bilingual corpus was cleaned, and translation knowledge for a pattern-based MT system was acquired from the cleaned corpus. As a result, the
more » ... nslation quality of the MT was improved despite reductions in the the corpus size to about 81% and 87% by using word-level and phrase-level literality scores, respectively.
doi:10.21437/icslp.2002-505 fatcat:h26kjrwy7jhspjj7sg4jnuwbva