Neural Machine Translation Using Multiple Back-translation Generated by Sampling

Kenji Imamura, Atsushi Fujita, Eiichiro Sumita
2020 Transactions of the Japanese society for artificial intelligence  
A large-scale parallel corpus is indispensable to train encoder-decoder neural machine translation. The method of using synthetic parallel texts, called back-translation, in which target monolingual sentences are automatically translated into the source language, has been proven effective in improving the decoder. However, it does not necessarily help enhance the encoder. In this paper, we propose a method that enhances not only the decoder but also the encoder using target monolingual corpora
more » ... y generating multiple source sentences via sampling-based sequence generation. The source sentences generated in that way increase their diversity and thus help make the encoder robust. Our experimental results show that the translation quality was improved by increasing the number of synthetic source sentences for each given target sentence. Even though the quality did not reach to the one that realized with a genuine parallel corpus comprising single human translations, our proposed method derived over 50% of the improvement brought by the parallel corpus using only its target side, i.e., monolingual data. Moreover, the proposed sampling method resulted in final translation of higher quality than n-best back-translation. These results indicate that not only the quality of back-translation but also the diversity of synthetic source sentences is crucial.
doi:10.1527/tjsai.a-ja9 fatcat:3etyetphhvbxreckxnazfvmdrq