Filters








42,558 Hits in 1.2 sec

Learning Factorized Multimodal Representations [article]

Yao-Hung Hubert Tsai and Paul Pu Liang and Amir Zadeh and Louis-Philippe Morency and Ruslan Salakhutdinov
2019 arXiv   pre-print
Lastly, we interpret our factorized representations to understand the interactions that influence multimodal learning.  ...  We introduce a model that factorizes representations into two sets of independent factors: multimodal discriminative and modality-specific generative factors.  ...  CONCLUSION In this paper, we proposed the Multimodal Factorization Model (MFM) for multimodal representation learning.  ... 
arXiv:1806.06176v3 fatcat:7s4ero4yyfetpmg3mboojbgabu

Modality-based Factorization for Multimodal Fusion

Elham J. Barezi, Pascale Fung
2019 Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)  
Applying a modality-based tensor factorization method, which adopts different factors for different modalities, results in removing information present in a modality that can be compensated by other modalities  ...  We have applied this method to three different multimodal datasets in sentiment analysis, personality trait recognition, and emotion recognition.  ...  Methodology Tucker Factorization for Multimodal Learning Modality-based Redundancy Reduc-tion Fusion (MRRF): We have used Tucker's tensor decomposition method (Tucker, 1966; Hitchcock, 1927) which  ... 
doi:10.18653/v1/w19-4331 dblp:conf/rep4nlp/BareziF19 fatcat:5c4iagoif5eqvkuwbvy2f6qgrq

Learning to Fuse Latent Representations for Multimodal Data

Oyebade K. Oyedotun, Djamila Aouada, Bjorn Ottersten
2019 ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
In this paper, we propose to instead learn the fusion of latent representations for multimodal data by using a modality gating mechanism that allows the dynamic weighting of extracted latent representations  ...  Multimodal learning leverages data from different modalities to improve the performance of a trained model.  ...  Therefore latent representation and factors of variations are used interchangeably. Multimodal learning becomes useful when F A ∪ F B = F A ; this is the goal of multimodal learning.  ... 
doi:10.1109/icassp.2019.8682454 dblp:conf/icassp/OyedotunAO19 fatcat:vqwlcn6wnjcuddmhklunmyi4p4

How to Sense the World: Leveraging Hierarchy in Multimodal Perception for Robust Reinforcement Learning Agents [article]

Miguel Vasco, Hang Yin, Francisco S. Melo, Ana Paiva
2021 arXiv   pre-print
The proposed model learns hierarchical representations: low-level modality-specific representations, encoded from raw observation data, and a high-level multimodal representation, encoding joint-modality  ...  This work addresses the problem of sensing the world: how to learn a multimodal representation of a reinforcement learning agent's environment that allows the execution of tasks under incomplete perceptual  ...  Figure 3 : 3 Figure 3: Learning hierarchical representations in the MUSE framework: (a) the modality-specific factors z 1:𝑀 , employing the loss in Eq. (2); (b) the multimodal factors z 𝜋 , encoded from  ... 
arXiv:2110.03608v1 fatcat:3s74ggrpybfefm5rkvxqieplmy

Online Matrix Factorization for Multimodal Image Retrieval [chapter]

Juan C. Caicedo, Fabio A. González
2012 Lecture Notes in Computer Science  
The method combines both data sources and generates one multimodal representation using latent factor analysis and matrix factorization.  ...  Another important characteristic of the method is that the multimodal representation is learned online using an efficient stochastic gradient descent formulation.  ...  The computational methods used in this work for learning such a multimodal space are based on matrix factorization.  ... 
doi:10.1007/978-3-642-33275-3_42 fatcat:d3h2huf36bbjtnxvkxgxaawayq

MISA: Modality-Invariant and -Specific Representations for Multimodal Sentiment Analysis [article]

Devamanyu Hazarika, Roger Zimmermann, Soujanya Poria
2020 arXiv   pre-print
The first subspace is modality-invariant, where the representations across modalities learn their commonalities and reduce the modality gap.  ...  These representations provide a holistic view of the multimodal data, which is used for fusion that leads to task predictions.  ...  Moreover, we incorporate orthogonal modality-specific representations -a trait less explored in multimodal learning tasks. Factorized representations.  ... 
arXiv:2005.03545v3 fatcat:nyomobnpojcefpyllea3scjdpq

Multimodal Intelligence: Representation Learning, Information Fusion, and Applications [article]

Chao Zhang, Zichao Yang, Xiaodong He, Li Deng
2020 arXiv   pre-print
This review provides a comprehensive analysis of recent works on multimodal deep learning from three perspectives: learning multimodal representations, fusing multimodal signals at various levels, and  ...  Regarding multimodal representation learning, we review the key concepts of embedding, which unify multimodal signals into a single vector space and thereby enable cross-modality signal processing.  ...  Representations can be factorized into two sets of independent factors: multimodal discriminative factors for supervised training and intra-modality generative factors for unsupervised training [119]  ... 
arXiv:1911.03977v3 fatcat:ojazuw3qzvfqrdweul6qdpxuo4

Deep Multimodal Emotion Recognition on Human Speech: A Review

Panagiotis Koromilas, Theodoros Giannakopoulos
2021 Applied Sciences  
Finally, we conclude this work with an in-depth analysis of the future challenges related to validation procedures, representation learning and method robustness.  ...  In addition, we review the basic feature representation methods for each modality, and we present aggregated evaluation results on the reported methodologies.  ...  An example of such models is presented in [59] , where the authors introduce the multimodal factorization model (MFM) that factorizes multimodal representations into multimodal discriminative factors  ... 
doi:10.3390/app11177962 fatcat:cezjfmjmvbgapo3tdz5j3iecp4

Select-Additive Learning: Improving Generalization in Multimodal Sentiment Analysis [article]

Haohan Wang, Aaksha Meghawat, Louis-Philippe Morency, Eric P. Xing
2017 arXiv   pre-print
However, multimodal sentiment analysis has only a few high-quality data sets annotated for training machine learning algorithms.  ...  In this paper, we propose a Select-Additive Learning (SAL) procedure that improves the generalizability of trained neural networks for multimodal sentiment analysis.  ...  During the Selection phase, SAL identifies the confounding factors from the latent representation learned by neural networks.  ... 
arXiv:1609.05244v2 fatcat:7eizi5wqwvge5jgkid7dz6trw4

Multimodal Deep Neural Networks using Both Engineered and Learned Representations for Biodegradability Prediction [article]

Garrett B. Goh, Khushmeen Sakloth, Charles Siegel, Abhinav Vishnu, Jim Pfaendtner
2018 arXiv   pre-print
In this work, we develop a novel multimodal CNN-MLP neural network architecture that utilizes both domain-specific feature engineering as well as learned representations from raw data.  ...  However, in other domains, large datasets on which to learn representations from may not exist.  ...  Our work therefore demonstrates that a multimodal network that combines the benefit of representation learning from raw data with expert-driven feature engineering is a viable approach in domain applications  ... 
arXiv:1808.04456v2 fatcat:z4qxtmklejgtbkxjxbfmyretym

Disentangle, align and fuse for multimodal and semi-supervised image segmentation [article]

Agisilaos Chartsias, Giorgos Papanastasiou, Chengjia Wang, Scott Semple, David E. Newby, Rohan Dharmakumar, Sotirios A. Tsaftaris
2020 arXiv   pre-print
Core to our method is learning a disentangled decomposition into anatomical and imaging factors.  ...  The imaging factor captures signal intensity characteristics across different modality data and is used for image reconstruction, enabling semi-supervised learning.  ...  However, a disentangled representation with modality invariant anatomy factors is not enough for multimodal learning.  ... 
arXiv:1911.04417v4 fatcat:qxlay6fzz5fdlcpta2epygydf4

HyperLearn: A Distributed Approach for Representation Learning in Datasets With Many Modalities [article]

Devanshu Arya, Stevan Rudinac, Marcel Worring
2019 arXiv   pre-print
Moreover, adding new modalities to our model requires only an additional GPU unit keeping the computational time unchanged, which brings representation learning to truly multimodal datasets.  ...  Learning representations in such a scenario is inherently complex due to the presence of multiple heterogeneous information channels.  ...  Most of existing multimodal representation learning methods can be split into two broad categories -multimodal network embeddings and tensor factorizationbased latent representation learning.  ... 
arXiv:1909.09252v1 fatcat:wneuun2p4rfdzphxa6lgaonfgy

Harmonized Multimodal Learning with Gaussian Process Latent Variable Models [article]

Guoli Song, Shuhui Wang, Qingming Huang, Qi Tian
2019 arXiv   pre-print
Previous multimodal GPLVM extensions generally adopt individual learning schemes on latent representations and kernel hyperparameters, which ignore their intrinsic relationship.  ...  Multimodal learning aims to discover the relationship between multiple modalities. It has become an important research topic due to extensive multimodal applications such as cross-modal retrieval.  ...  DCCAE [29] is a deep extension of the popular CCA for deep multimodal representation learning.  ... 
arXiv:1908.04979v1 fatcat:hpjdzvau3fhzrnk5lz6ahjqeuu

Strong and Simple Baselines for Multimodal Utterance Embeddings

Paul Pu Liang, Yao Chong Lim, Yao-Hung Hubert Tsai, Ruslan Salakhutdinov, Louis-Philippe Morency
2019 Proceedings of the 2019 Conference of the North  
In order to capture richer representations, our second baseline extends the first by factorizing into unimodal, bimodal, and trimodal factors, while retaining simplicity and efficiency during learning  ...  In this paper, we propose two simple but strong baselines to learn embeddings of multimodal utterances. The first baseline assumes a conditional factorization of the utterance into unimodal factors.  ...  Baseline 1: Factorized Unimodal Model In this subsection, we outline our method for learning representations of multimodal utterances.  ... 
doi:10.18653/v1/n19-1267 dblp:conf/naacl/LiangLTSM19 fatcat:h3d66l3b2zevthf5j4khobij2i

Multimodal Embeddings from Language Models [article]

Shao-Yen Tseng, Panayiotis Georgiou, Shrikanth Narayanan
2019 arXiv   pre-print
The resulting representations from this model are multimodal and contain paralinguistic information which can modify word meanings and provide affective information.  ...  Word embeddings such as ELMo have recently been shown to model word semantics with greater efficacy through contextualized learning on large-scale language corpora, resulting in significant improvement  ...  To learn representations from multimodal data Hsu et al. [29] proposed the use of variational autoencoders to encode inter-and intra-modal factors into separate latent variables. Later, Tsai et al.  ... 
arXiv:1909.04302v1 fatcat:t5zdtkkznnea5lat4uip32mb7u
« Previous Showing results 1 — 15 out of 42,558 results