DCU-UvA Multimodal MT System Report

Iacer Calixto, Desmond Elliott, Stella Frank
2016 Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers  
We present a doubly-attentive multimodal machine translation model. Our model learns to attend to source language and spatial-preserving CONV 5,4 visual features as separate attention mechanisms in a neural translation model. In image description translation experiments (Task 1), we find an improvement of 2.3 Meteor points compared to initialising the hidden state of the decoder with only the FC 7 features and 2.9 Meteor points compared to a text-only neural machine translation baseline,
more » ... ing the useful nature of attending to the CONV 5,4 features.
doi:10.18653/v1/w16-2359 dblp:conf/wmt/CalixtoEF16 fatcat:ocpnpzalszd2pjlytammykhc2q