Filters








20,022 Hits in 2.1 sec

Multimodal Transformer for Multimodal Machine Translation

Shaowei Yao, Xiaojun Wan
2020 Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics   unpublished
Multimodal Machine Translation (MMT) aims to introduce information from other modality, generally static images, to improve the translation quality.  ...  In this paper, we propose the multimodal self-attention in Transformer to solve the issues above.  ...  We thank the anonymous reviewers for their helpful comments. Xiaojun Wan is the corresponding author.  ... 
doi:10.18653/v1/2020.acl-main.400 fatcat:k3crvaqf7zavdl3htttakyqcy4

MSVD-Turkish: A Comprehensive Multimodal Dataset for Integrated Vision and Language Research in Turkish [article]

Begum Citamak and Ozan Caglayan and Menekse Kuyu and Erkut Erdem and Aykut Erdem and Pranava Madhyastha and Lucia Specia
2020 arXiv   pre-print
In our experiments, we build models for both video captioning and multimodal machine translation and investigate the effect of different word segmentation approaches and different neural architectures  ...  We hope that the MSVD-Turkish dataset and the results reported in this work will lead to better video captioning and multimodal machine translation models for Turkish and other morphology rich and agglutinative  ...  Multimodal Machine Translation We now fix the choice of segmentation to words for English and to SPM30K for Turkish, and proceed with the multimodal machine translation results.  ... 
arXiv:2012.07098v1 fatcat:qpyilykdtvhmzo4xp6iqvm7fjm

Adaptive Fusion Techniques for Multimodal Data [article]

Gaurav Sahu, Olga Vechtomova
2021 arXiv   pre-print
A quantitative evaluation on the tasks of multimodal machine translation and emotion recognition suggests that our lightweight, adaptive networks can better model context from other modalities than existing  ...  Instead of defining a deterministic fusion operation, such as concatenation, for the network, we let the network decide "how" to combine a given set of multimodal features more effectively.  ...  Acknowledgments We acknowledge Compute Canada's GPU support for our experiments. We also thank Dhruv Kumar for their meaningful suggestions on the manuscript.  ... 
arXiv:1911.03821v2 fatcat:b2nggb5h7reehhvdxuobehqwci

Supervised Visual Attention for Multimodal Neural Machine Translation

Tetsuro Nishihara, Akihiro Tamura, Takashi Ninomiya, Yutaro Omote, Hideki Nakayama
2021 Journal of Natural Language Processing  
This paper proposed a supervised visual attention mechanism for multimodal neural machine translation (MNMT), trained with constraints based on manual alignments between words in a sentence and their corresponding  ...  Our experiments on English-German and German-English translation tasks using the Multi30k dataset and on English-Japanese and Japanese-English translation tasks using the Flickr30k Entities JP dataset  ...  "LIUM-CVC Submissions for WMT17 Multimodal Translation Task." In Proceedings of the 2nd Conference on Machine Translation, pp. 432-439, Copenhagen, Denmark.  ... 
doi:10.5715/jnlp.28.554 fatcat:gp6zorbdeje5jn2gi6sqlfifam

D4.1 Report on Multimodal Machine Translation

Stig-Arne Grönroos, Umut Sulubacak, Jörg Tiedemann
2018 Zenodo  
So far, one multimodal machine translation system has been developed within WP4 of the MeMAD project for either task, and especially the image caption translation system had great success.  ...  Multimodal machine translation involves drawing information from more than one modality (text, audio, and visuals), and is an emerging subject within the machine translation community.  ...  In addition the Finnish IT Center for Science (CSC) provided computational resources. We would also like to acknowledge the support by NVIDIA and their GPU grant.  ... 
doi:10.5281/zenodo.3690761 fatcat:n3b34ooubfayxphgyf6bli6bya

Hausa Visual Genome: A Dataset for Multi-Modal English to Hausa Machine Translation [article]

Idris Abdulmumin, Satya Ranjan Dash, Musa Abdullahi Dawud, Shantipriya Parida, Shamsuddeen Hassan Muhammad, Ibrahim Sa'id Ahmad, Subhadarshi Panda, Ondřej Bojar, Bashir Shehu Galadanci, Bello Shehu Bello
2022 arXiv   pre-print
Multi-modal Machine Translation (MMT) enables the use of visual information to enhance the quality of translations.  ...  The Hausa Visual Genome is the first dataset of its kind and can be used for Hausa-English machine translation, multi-modal research, and image description, among various other natural language processing  ...  To describe the process of building the multimodal dataset for the Hausa language suitable for Englishto-Hausa machine translation, image captioning, and multimodal research. 2.  ... 
arXiv:2205.01133v2 fatcat:iyqz4vn2grcprasmfxtuvvti3i

Transformer-based Cascaded Multimodal Speech Translation [article]

Zixiu Wu, Ozan Caglayan, Julia Ive, Josiah Wang, Lucia Specia
2019 arXiv   pre-print
The architecture consists of an automatic speech recognition (ASR) system followed by a Transformer-based multimodal machine translation (MMT) system.  ...  This paper describes the cascaded multimodal speech translation systems developed by Imperial College London for the IWSLT 2019 evaluation campaign.  ...  are obtained from an automatic speech recognition system (ASR) and further translated into the target language using a machine translation (MT) component.  ... 
arXiv:1910.13215v3 fatcat:rjlpcmvw3nflvnpe7kt4p64waq

Multimodal machine translation through visuals and speech

Umut Sulubacak, Ozan Caglayan, Stig-Arne Grönroos, Aku Rouhe, Desmond Elliott, Lucia Specia, Jörg Tiedemann
2020 Machine Translation  
Multimodal machine translation involves drawing information from more than one modality, based on the assumption that the additional modalities will contain useful alternative views of the input data.  ...  The paper concludes with a discussion of directions for future research in these areas: the need for more expansive and challenging datasets, for targeted evaluations of model performance, and for multimodality  ...  We would also like to thank Maarit Koponen for her valuable feedback and her help in establishing our discussions of machine translation evaluation.  ... 
doi:10.1007/s10590-020-09250-0 fatcat:jod3ghcsnnbipotcqp6sme4lna

Transformer-based Cascaded Multimodal Speech Translation

Zixiu Wu, Ozan Caglayan, Julia Ive, Josiah Wang, Lucia Specia
2019 Zenodo  
The architecture consists of an automatic speech recognition (ASR) system followed by a Transformer-based multimodal machine translation (MMT) system.  ...  This paper describes the cascaded multimodal speech translation systems developed by Imperial College London for the IWSLT 2019 evaluation campaign.  ...  are obtained from an automatic speech recognition system (ASR) and further translated into the target language using a machine translation (MT) component.  ... 
doi:10.5281/zenodo.3525552 fatcat:oaqieozesbd4di62d2lmcxl2ya

Adversarial Evaluation of Multimodal Machine Translation

Desmond Elliott
2018 Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing  
The promise of combining vision and language in multimodal machine translation is that systems will produce better translations by leveraging the image data.  ...  Our evaluation measures whether multimodal translation systems perform better given either the congruent image or a random incongruent image, in addition to the correct source language sentence.  ...  Acknowledgements Barry Haddow asked if we knew whether the additional image data actually improved the quality of multimodal machine translation.  ... 
doi:10.18653/v1/d18-1329 dblp:conf/emnlp/Elliott18 fatcat:diyoojv2krb5ncn4fsniqxzhwq

TCT: A Cross-supervised Learning Method for Multimodal Sequence Representation [article]

Wubo Li, Wei Zou, Xiangang Li
2019 arXiv   pre-print
To tackle this, we propose the Transformer based Cross-modal Translator (TCT) to learn unimodal sequence representations by translating from other related multimodal sequences on a supervised learning  ...  Combined TCT with Multimodal Transformer Network (MTN), we evaluate MTN-TCT on the video-grounded dialogue which uses multimodality.  ...  The research field of Multimodal Machine Learning brings some unique challenges for computational researchers given the heterogeneity of the data.  ... 
arXiv:1911.05186v1 fatcat:pwcnuz2qwfeghl2yts6yc7jpuu

A Visual Attention Grounding Neural Model for Multimodal Machine Translation [article]

Mingyang Zhou, Runxiang Cheng, Yong Jae Lee, Zhou Yu
2018 arXiv   pre-print
We introduce a novel multimodal machine translation model that utilizes parallel visual and textual information.  ...  Our model jointly optimizes the learning of a shared visual-language embedding and a translator.  ...  We also want to thank Chunting Zhou and Ozan Caglayan for suggestions on machine translation model implementation. This work was supported in part by NSF CAREER IIS-1751206.  ... 
arXiv:1808.08266v2 fatcat:g2hjoxrlobcxhhhhn6tzr46xe4

A Visual Attention Grounding Neural Model for Multimodal Machine Translation

Mingyang Zhou, Runxiang Cheng, Yong Jae Lee, Zhou Yu
2018 Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing  
We introduce a novel multimodal machine translation model that utilizes parallel visual and textual information.  ...  Our model jointly optimizes the learning of a shared visuallanguage embedding and a translator.  ...  We also want to thank Chunting Zhou and Ozan Caglayan for suggestions on machine translation model implementation. This work was supported in part by NSF CAREER IIS-1751206.  ... 
doi:10.18653/v1/d18-1400 dblp:conf/emnlp/ZhouCLY18 fatcat:ohldvjlpdnaf5ohlfy5ros7tya

Unsupervised Multimodal Neural Machine Translation with Pseudo Visual Pivoting [article]

Po-Yao Huang, Junjie Hu, Xiaojun Chang, Alexander Hauptmann
2020 arXiv   pre-print
Unsupervised machine translation (MT) has recently achieved impressive results with monolingual corpora only.  ...  Our model employs multimodal back-translation and features pseudo visual pivoting in which we learn a shared multilingual visual-semantic embedding space and incorporate visually-pivoted captioning as  ...  The authors would like to thank the anonymous reviewers for their suggestions and Google Cloud for providing the research credits.  ... 
arXiv:2005.03119v1 fatcat:2vbqu6242ze2nd4cfw6jlnjpdu

Multimodal Machine Translation

Jiatong Liu
2021 IEEE Access  
However, most of the existing machine translation models only use text data for translation.  ...  can significantly improve the quality of multimodal neural network machine translation.  ...  Chen for their help in this work.  ... 
doi:10.1109/access.2021.3115135 fatcat:d2anaeg3qnarpfmlc4eap2urrm
« Previous Showing results 1 — 15 out of 20,022 results