Decomposability of translation metrics for improved evaluation and efficient algorithms

David Chiang, Steve DeNeefe, Yee Seng Chan, Hwee Tou Ng
2008 Proceedings of the Conference on Empirical Methods in Natural Language Processing - EMNLP '08   unpublished
B is the de facto standard for evaluation and development of statistical machine translation systems. We describe three real-world situations involving comparisons between different versions of the same systems where one can obtain improvements in B scores that are questionable or even absurd. These situations arise because B lacks the property of decomposability, a property which is also computationally convenient for various applications. We propose a very conservative modification
more » ... B and a cross between B and word error rate that address these issues while improving correlation with human judgments.
doi:10.3115/1613715.1613791 fatcat:67clm6yxofgnxbbcsxt2hs2f3q