A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is
Link to publication Creative Commons License (see https://creativecommons.org/use-remix/cc-licenses): CC BY Citation for published version (APA): Abstract In this work, we propose to model the interaction between visual and textual features for multi-modal neural machine translation (MMT) through a latent variable model. This latent variable can be seen as a multi-modal stochastic embedding of an image and its description in a foreign language. It is used in a target-language decoder and alsodoi:10.18653/v1/p19-1642 dblp:conf/acl/CalixtoRA19 fatcat:y3t5oh36x5bqfagucebi424rwq