A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2018; you can also visit the original URL.
The file type is
We present a doubly-attentive multimodal machine translation model. Our model learns to attend to source language and spatial-preserving CONV 5,4 visual features as separate attention mechanisms in a neural translation model. In image description translation experiments (Task 1), we find an improvement of 2.3 Meteor points compared to initialising the hidden state of the decoder with only the FC 7 features and 2.9 Meteor points compared to a text-only neural machine translation baseline,doi:10.18653/v1/w16-2359 dblp:conf/wmt/CalixtoEF16 fatcat:ocpnpzalszd2pjlytammykhc2q