Multimodal Transformer for Multimodal Machine Translation

Shaowei Yao, Xiaojun Wan
2020 Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics   unpublished
Multimodal Machine Translation (MMT) aims to introduce information from other modality, generally static images, to improve the translation quality. Previous works propose various incorporation methods, but most of them do not consider the relative importance of multiple modalities. In MMT, equally treating text and images may encode too much irrelevant information from images which may introduce noise. In this paper, we propose the multimodal self-attention in Transformer to solve the issues
more » ... ove. The proposed method learns the representations of images based on the text, which avoids encoding irrelevant information in images. Experiments and visualization analysis demonstrate that our model benefits from visual information and substantially outperforms previous works and competitive baselines in terms of various metrics.
doi:10.18653/v1/2020.acl-main.400 fatcat:k3crvaqf7zavdl3htttakyqcy4