D4.1 Report on Multimodal Machine Translation

Stig-Arne Grönroos, Umut Sulubacak, Jörg Tiedemann
2018 Zenodo  
Multimodal machine translation involves drawing information from more than one modality (text, audio, and visuals), and is an emerging subject within the machine translation community. In MeMAD, multimodal translation is of particular interest in facilitating cross-lingual multimodal content retrieval, and is one of the main focuses of WP4. Though multimodal machine translation efforts have been emerging since the early 1990s, there has not been research on a large scale until the last decade.
more » ... specially prominent are the multimodal tasks of spoken language translation and image caption translation, exploiting audio and visual modalities respectively. Both of these tasks are championed by evaluation campaigns, acting as competitions to stimulate research and to serve as a regulated platform investigating evaluation methodologies. So far, one multimodal machine translation system has been developed within WP4 of the MeMAD project for either task, and especially the image caption translation system had great success. In this deliverable, we present a survey of the state of the art in machine translation with an emphasis on multimodal tasks and systems. Later, we describe our own multimodal machine translation efforts carried out in WP4 within the first year of MeMAD. Finally, to conclude our report, we discuss our plans of tackling video subtitle and audio description translations as the next steps in WP4.
doi:10.5281/zenodo.3690761 fatcat:n3b34ooubfayxphgyf6bli6bya