Filters








2,736 Hits in 5.4 sec

Found in Translation: Learning Robust Joint Representations by Cyclic Translations Between Modalities [article]

Hai Pham, Paul Pu Liang, Thomas Manzini, Louis-Philippe Morency, Barnabas Poczos
2020 arXiv   pre-print
In this paper, we propose a method to learn robust joint representations by translating between modalities.  ...  However, existing work learns joint representations by requiring all modalities as input and as a result, the learned representations may be sensitive to noisy or missing modalities at test time.  ...  Acknowledgements PPL and LPM are partially supported by the NSF (Award #1833355) and Oculus VR. HP and BP are supported by NSF grant IIS1563887 and the DARPA D3M program.  ... 
arXiv:1812.07809v2 fatcat:fvxwz2xhsnfp7ny6shbzola6gm

Modality to Modality Translation: An Adversarial Representation Learning and Graph Fusion Network for Multimodal Fusion [article]

Sijie Mai and Haifeng Hu and Songlong Xing
2020 arXiv   pre-print
Visualization of the learned embeddings suggests that the joint embedding space learned by our method is discriminative. code is available at:  ...  Learning joint embedding space for various modalities is of vital importance for multimodal fusion.  ...  Moreover, Multimodal Cyclic Translation Network (MCTN) (Pham et al. 2019 ) learn joint representations via encoder-decoder framework by translating encoder's input (source modality) into decoder's output  ... 
arXiv:1911.07848v4 fatcat:5chcpemydjf5parolbjbeaojly

Mix and match networks: cross-modal alignment for zero-pair image-to-image translation [article]

Yaxing Wang, Luis Herranz, Joost van de Weijer
2020 arXiv   pre-print
This paper addresses the problem of inferring unseen cross-modal image-to-image translations between multiple modalities.  ...  information between the unseen modalities.  ...  Acknowledgements The Titan Xp used for this research was donated by the NVIDIA Corporation.  ... 
arXiv:1903.04294v2 fatcat:m4cecpfwbnhwrlx5rsngblxhke

Mix and match networks: encoder-decoder alignment for zero-pair image translation [article]

Yaxing Wang, Joost van de Weijer, Luis Herranz
2018 arXiv   pre-print
We address the problem of image translation between domains or modalities for which no direct paired data is available (i.e. zero-pair translation).  ...  between domains or modalities for which explicit paired samples were not seen during training.  ...  [20] show that unsupervised mappings can be learned by imposing a joint latent space between the encoder and the decoder.  ... 
arXiv:1804.02199v1 fatcat:hj5wbxobsndqjmy4hkzz3g6l5m

Mix and Match Networks: Encoder-Decoder Alignment for Zero-Pair Image Translation

Yaxing Wang, Joost van de Weijer, Luis Herranz
2018 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition  
We address the problem of image translation between domains or modalities for which no direct paired data is available (i.e. zero-pair translation).  ...  between domains or modalities for which explicit paired samples were not seen during training.  ...  [20] show that unsupervised mappings can be learned by imposing a joint latent space between the encoder and the decoder.  ... 
doi:10.1109/cvpr.2018.00573 dblp:conf/cvpr/WangWH18 fatcat:n7h6icfxxbbcbh5si4cmkzllra

Unsupervised Attention-guided Image to Image Translation [article]

Youssef A. Mejjati and Christian Richardt and James Tompkin and Darren Cosker and Kwang In Kim
2018 arXiv   pre-print
Motivated by the important role of attention in human perception, we tackle this limitation by introducing unsupervised attention mechanisms that are jointly adversarialy trained with the generators and  ...  We demonstrate qualitatively and quantitatively that our approach is able to attend to relevant regions in the image without requiring supervision, and that by doing so it achieves more realistic mappings  ...  s multi-modal UNIT (MUNIT) [17] framework extends this idea to multi-modal image-to-image translation by assuming two latent representations: one for 'style' and one for 'content'.  ... 
arXiv:1806.02311v3 fatcat:m5drdzhaizajhiaxtehrhsef6a

Modality to Modality Translation: An Adversarial Representation Learning and Graph Fusion Network for Multimodal Fusion

Sijie Mai, Haifeng Hu, Songlong Xing
2020 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
Visualization of the learned embeddings suggests that the joint embedding space learned by our method is discriminative.  ...  Learning joint embedding space for various modalities is of vital importance for multimodal fusion.  ...  Moreover, Multimodal Cyclic Translation Network (MCTN) (Pham et al. 2019 ) learn joint representations via encoder-decoder framework by translating encoder's input (source modality) into decoder's output  ... 
doi:10.1609/aaai.v34i01.5347 fatcat:d2u2ckavfbeuhfkl54ojt6xexa

From Research to Innovation: Exploring the Translation Journey with OpenInnoTrain [article]

Anne-Laure Mention, Massimo Menichinelli
2021 Zenodo  
Identifying, developing and scaling-up breakthrough technologies, and converting them into incremental, radical or disruptive innovations that are widely accepted by, and available to beneficiaries, users  ...  Over the last 18 months of COVID-19 pandemic, we have experienced unprecedented circumstances in our lifetime that have demonstrated the critical role of knowledge creation and transmission across industries  ...  Distinct approach to knowledge production and use At this juncture, it needs to be noted that most of the research translation and UIC literature has been based on patent data or studies involving natural  ... 
doi:10.5281/zenodo.5536932 fatcat:7pfwvjaznfagjgcapb2bm2zkuu

Multimodal Co-learning: Challenges, Applications with Datasets, Recent Advances and Future Directions [article]

Anil Rahate, Rahee Walambe, Sheela Ramanna, Ketan Kotecha
2021 arXiv   pre-print
The modeling of a (resource-poor) modality is aided by exploiting knowledge from another (resource-rich) modality using transfer of knowledge between modalities, including their representations and predictive  ...  Multimodal machine learning involves multiple aspects: representation, translation, alignment, fusion, and co-learning.  ...  statement Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in  ... 
arXiv:2107.13782v2 fatcat:s4spofwxjndb7leqbcqnwbifq4

PSIGAN: Joint probabilistic segmentation and image distribution matching for unpaired cross-modality adaptation based MRI segmentation [article]

Jue Jiang, Yu Chi Hu, Neelam Tyagi, Andreas Rimner, Nancy Lee, Joseph O. Deasy, Sean Berry, Harini Veeraraghavan
2020 arXiv   pre-print
Our UDA approach models the co-dependency between images and their segmentation as a joint probability distribution using a new structure discriminator.  ...  This leads to a cyclical optimization of both the generator and segmentation sub-networks that are jointly trained as part of an end-to-end network.  ...  This issue has been avoided by using joint latent space learning with variational autoencoders [7] and disentanged feature representations [26] .  ... 
arXiv:2007.09465v1 fatcat:mzm7lgdjjzc63jaesk2jwmesqq

Deep Multimodal Emotion Recognition on Human Speech: A Review

Panagiotis Koromilas, Theodoros Giannakopoulos
2021 Applied Sciences  
Finally, we conclude this work with an in-depth analysis of the future challenges related to validation procedures, representation learning and method robustness.  ...  In addition, we review the basic feature representation methods for each modality, and we present aggregated evaluation results on the reported methodologies.  ...  The multimodal cyclic translation network (MCTN) learns a joint representation space for different modalities.  ... 
doi:10.3390/app11177962 fatcat:cezjfmjmvbgapo3tdz5j3iecp4

Binding and Perspective Taking as Inference in a Generative Neural Network Model [article]

Mahdi Sadeghi, Fabian Schrodt, Sebastian Otte, Martin V. Butz
2020 arXiv   pre-print
Here we focus on a generative encoder-decoder architecture that adapts its perspective and binds features by means of retrospective inference.  ...  In addition, redundant feature properties and population encodings are shown to be highly useful.  ...  Even if some latent activity of Joint 2 is added to Joint 1, this disruption seems to be minor, also indicating robust Gestalt perception.  ... 
arXiv:2012.05152v1 fatcat:aue6snwgyzcfplwm4gcahywdum

Quality Guided Sketch-to-Photo Image Synthesis [article]

Uche Osahor, Hadi Kazemi, Ali Dabouei, Nasser Nasrabadi
2020 arXiv   pre-print
Facial sketches drawn by artists are widely used for visual identification applications and mostly by law enforcement agencies, but the quality of these sketches depend on the ability of the artist to  ...  We synthesised sketches using XDOG filter for the CelebA, WVU Multi-modal and CelebA-HQ datasets and from an auxiliary generator trained on sketches from CUHK, IIT-D and FERET datasets.  ...  [47] attempted to solve this problem by treating the image retrieval as a search in the learned feature embedding space.  ... 
arXiv:2005.02133v1 fatcat:nmwjpfolafcnjkwuqks2jllmwi

A Survey Of Cross-lingual Word Embedding Models [article]

Sebastian Ruder, Ivan Vulić, Anders Søgaard
2019 arXiv   pre-print
Cross-lingual representations of words enable us to reason about word meaning in multilingual contexts and are a key facilitator of cross-lingual transfer when developing natural language processing models  ...  In this survey, we provide a comprehensive typology of cross-lingual word embedding models. We compare their data requirements and objective functions.  ...  Ivan's work is supported by the ERC Consolidator Grant LEXICAL: Lexical Acquisition Across Languages (no 648909).  ... 
arXiv:1706.04902v3 fatcat:lts6uop77zaazhzlbygqmdsama

IQ-VQA: Intelligent Visual Question Answering [article]

Vatsal Goel, Mohit Chandak, Ashish Anand, Prithwijit Guha
2020 arXiv   pre-print
In addition, we also quantitatively show improvement in attention maps which highlights better multi-modal understanding of vision and language.  ...  To this end, we propose a model-independent cyclic framework which increases consistency and robustness of any VQA architecture.  ...  This proves that by learning on these variations, our framework not only improves the consistency and robustness of any generic VQA model but also achieves a stronger multi modal understanding of vision  ... 
arXiv:2007.04422v1 fatcat:3kymlbz3b5brzpnvjpd7unvesa
« Previous Showing results 1 — 15 out of 2,736 results