A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Multimodal Transformer for Multimodal Machine Translation
2020
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
unpublished
Multimodal Machine Translation (MMT) aims to introduce information from other modality, generally static images, to improve the translation quality. ...
In this paper, we propose the multimodal self-attention in Transformer to solve the issues above. ...
We thank the anonymous reviewers for their helpful comments. Xiaojun Wan is the corresponding author. ...
doi:10.18653/v1/2020.acl-main.400
fatcat:k3crvaqf7zavdl3htttakyqcy4
MSVD-Turkish: A Comprehensive Multimodal Dataset for Integrated Vision and Language Research in Turkish
[article]
2020
arXiv
pre-print
In our experiments, we build models for both video captioning and multimodal machine translation and investigate the effect of different word segmentation approaches and different neural architectures ...
We hope that the MSVD-Turkish dataset and the results reported in this work will lead to better video captioning and multimodal machine translation models for Turkish and other morphology rich and agglutinative ...
Multimodal Machine Translation We now fix the choice of segmentation to words for English and to SPM30K for Turkish, and proceed with the multimodal machine translation results. ...
arXiv:2012.07098v1
fatcat:qpyilykdtvhmzo4xp6iqvm7fjm
Adaptive Fusion Techniques for Multimodal Data
[article]
2021
arXiv
pre-print
A quantitative evaluation on the tasks of multimodal machine translation and emotion recognition suggests that our lightweight, adaptive networks can better model context from other modalities than existing ...
Instead of defining a deterministic fusion operation, such as concatenation, for the network, we let the network decide "how" to combine a given set of multimodal features more effectively. ...
Acknowledgments We acknowledge Compute Canada's GPU support for our experiments. We also thank Dhruv Kumar for their meaningful suggestions on the manuscript. ...
arXiv:1911.03821v2
fatcat:b2nggb5h7reehhvdxuobehqwci
Supervised Visual Attention for Multimodal Neural Machine Translation
2021
Journal of Natural Language Processing
This paper proposed a supervised visual attention mechanism for multimodal neural machine translation (MNMT), trained with constraints based on manual alignments between words in a sentence and their corresponding ...
Our experiments on English-German and German-English translation tasks using the Multi30k dataset and on English-Japanese and Japanese-English translation tasks using the Flickr30k Entities JP dataset ...
"LIUM-CVC Submissions for WMT17 Multimodal Translation Task." In Proceedings of the 2nd Conference on Machine Translation, pp. 432-439, Copenhagen, Denmark. ...
doi:10.5715/jnlp.28.554
fatcat:gp6zorbdeje5jn2gi6sqlfifam
D4.1 Report on Multimodal Machine Translation
2018
Zenodo
So far, one multimodal machine translation system has been developed within WP4 of the MeMAD project for either task, and especially the image caption translation system had great success. ...
Multimodal machine translation involves drawing information from more than one modality (text, audio, and visuals), and is an emerging subject within the machine translation community. ...
In addition the Finnish IT Center for Science (CSC) provided computational resources. We would also like to acknowledge the support by NVIDIA and their GPU grant. ...
doi:10.5281/zenodo.3690761
fatcat:n3b34ooubfayxphgyf6bli6bya
Hausa Visual Genome: A Dataset for Multi-Modal English to Hausa Machine Translation
[article]
2022
arXiv
pre-print
Multi-modal Machine Translation (MMT) enables the use of visual information to enhance the quality of translations. ...
The Hausa Visual Genome is the first dataset of its kind and can be used for Hausa-English machine translation, multi-modal research, and image description, among various other natural language processing ...
To describe the process of building the multimodal dataset for the Hausa language suitable for Englishto-Hausa machine translation, image captioning, and multimodal research. 2. ...
arXiv:2205.01133v2
fatcat:iyqz4vn2grcprasmfxtuvvti3i
Transformer-based Cascaded Multimodal Speech Translation
[article]
2019
arXiv
pre-print
The architecture consists of an automatic speech recognition (ASR) system followed by a Transformer-based multimodal machine translation (MMT) system. ...
This paper describes the cascaded multimodal speech translation systems developed by Imperial College London for the IWSLT 2019 evaluation campaign. ...
are obtained from an automatic speech recognition system (ASR) and further translated into the target language using a machine translation (MT) component. ...
arXiv:1910.13215v3
fatcat:rjlpcmvw3nflvnpe7kt4p64waq
Multimodal machine translation through visuals and speech
2020
Machine Translation
Multimodal machine translation involves drawing information from more than one modality, based on the assumption that the additional modalities will contain useful alternative views of the input data. ...
The paper concludes with a discussion of directions for future research in these areas: the need for more expansive and challenging datasets, for targeted evaluations of model performance, and for multimodality ...
We would also like to thank Maarit Koponen for her valuable feedback and her help in establishing our discussions of machine translation evaluation. ...
doi:10.1007/s10590-020-09250-0
fatcat:jod3ghcsnnbipotcqp6sme4lna
Transformer-based Cascaded Multimodal Speech Translation
2019
Zenodo
The architecture consists of an automatic speech recognition (ASR) system followed by a Transformer-based multimodal machine translation (MMT) system. ...
This paper describes the cascaded multimodal speech translation systems developed by Imperial College London for the IWSLT 2019 evaluation campaign. ...
are obtained from an automatic speech recognition system (ASR) and further translated into the target language using a machine translation (MT) component. ...
doi:10.5281/zenodo.3525552
fatcat:oaqieozesbd4di62d2lmcxl2ya
Adversarial Evaluation of Multimodal Machine Translation
2018
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
The promise of combining vision and language in multimodal machine translation is that systems will produce better translations by leveraging the image data. ...
Our evaluation measures whether multimodal translation systems perform better given either the congruent image or a random incongruent image, in addition to the correct source language sentence. ...
Acknowledgements Barry Haddow asked if we knew whether the additional image data actually improved the quality of multimodal machine translation. ...
doi:10.18653/v1/d18-1329
dblp:conf/emnlp/Elliott18
fatcat:diyoojv2krb5ncn4fsniqxzhwq
TCT: A Cross-supervised Learning Method for Multimodal Sequence Representation
[article]
2019
arXiv
pre-print
To tackle this, we propose the Transformer based Cross-modal Translator (TCT) to learn unimodal sequence representations by translating from other related multimodal sequences on a supervised learning ...
Combined TCT with Multimodal Transformer Network (MTN), we evaluate MTN-TCT on the video-grounded dialogue which uses multimodality. ...
The research field of Multimodal Machine Learning brings some unique challenges for computational researchers given the heterogeneity of the data. ...
arXiv:1911.05186v1
fatcat:pwcnuz2qwfeghl2yts6yc7jpuu
A Visual Attention Grounding Neural Model for Multimodal Machine Translation
[article]
2018
arXiv
pre-print
We introduce a novel multimodal machine translation model that utilizes parallel visual and textual information. ...
Our model jointly optimizes the learning of a shared visual-language embedding and a translator. ...
We also want to thank Chunting Zhou and Ozan Caglayan for suggestions on machine translation model implementation. This work was supported in part by NSF CAREER IIS-1751206. ...
arXiv:1808.08266v2
fatcat:g2hjoxrlobcxhhhhn6tzr46xe4
A Visual Attention Grounding Neural Model for Multimodal Machine Translation
2018
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
We introduce a novel multimodal machine translation model that utilizes parallel visual and textual information. ...
Our model jointly optimizes the learning of a shared visuallanguage embedding and a translator. ...
We also want to thank Chunting Zhou and Ozan Caglayan for suggestions on machine translation model implementation. This work was supported in part by NSF CAREER IIS-1751206. ...
doi:10.18653/v1/d18-1400
dblp:conf/emnlp/ZhouCLY18
fatcat:ohldvjlpdnaf5ohlfy5ros7tya
Unsupervised Multimodal Neural Machine Translation with Pseudo Visual Pivoting
[article]
2020
arXiv
pre-print
Unsupervised machine translation (MT) has recently achieved impressive results with monolingual corpora only. ...
Our model employs multimodal back-translation and features pseudo visual pivoting in which we learn a shared multilingual visual-semantic embedding space and incorporate visually-pivoted captioning as ...
The authors would like to thank the anonymous reviewers for their suggestions and Google Cloud for providing the research credits. ...
arXiv:2005.03119v1
fatcat:2vbqu6242ze2nd4cfw6jlnjpdu
Multimodal Machine Translation
2021
IEEE Access
However, most of the existing machine translation models only use text data for translation. ...
can significantly improve the quality of multimodal neural network machine translation. ...
Chen for their help in this work. ...
doi:10.1109/access.2021.3115135
fatcat:d2anaeg3qnarpfmlc4eap2urrm
« Previous
Showing results 1 — 15 out of 20,022 results