Multimodal Neural Machine Translation for Low-resource Language Pairs using Synthetic Data

Koel Dutta Chowdhury, Mohammed Hasanuzzaman, Qun Liu
2018 Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP  
In this paper, we investigate the effectiveness of training a multimodal neural machine translation (MNMT) system with image features for a lowresource language pair, Hindi and English, using synthetic data. A threeway parallel corpus which contains bilingual texts and corresponding images is required to train a MNMT system with image features. However, such a corpus is not available for low resource language pairs. To address this, we developed both a synthetic training dataset and a manually
more » ... urated development/test dataset for Hindi based on an existing English-image parallel corpus. We used these datasets to build our image description translation system by adopting state-of-theart MNMT models. Our results show that it is possible to train a MNMT system for low-resource language pairs through the use of synthetic data and that such a system can benefit from image features.
doi:10.18653/v1/w18-3405 dblp:conf/acl-deeplo/ChowdhuryHL18 fatcat:hmav4zatmzgazjx3wzq2nox74i