Filters








1,889 Hits in 6.2 sec

Deep Vision Multimodal Learning: Methodology, Benchmark, and Trend

Wenhao Chai, Gaoang Wang
2022 Applied Sciences  
Deep vision multimodal learning aims at combining deep visual representation learning with other modalities, such as text, sound, and data collected from other sensors.  ...  With the fast development of deep learning, vision multimodal learning has gained much interest from the community.  ...  It shows that another modality mainly controls the loss value of the current modality. Multimodal Loss Function Cross Cross-modal center loss [44] .  ... 
doi:10.3390/app12136588 fatcat:bokdxwkcwbgjlpblfrwbj4mtxm

Deep Hierarchical Fusion with Application in Sentiment Analysis

Efthymios Georgiou, Charilaos Papaioannou, Alexandros Potamianos
2019 Interspeech 2019  
Two bidirectional Long-Short-Term-Memory networks (BiLSTM), followed by multiple fully connected layers, are trained in order to extract feature representations for each of the textual and audio modalities  ...  In this work, we introduce a hierarchical fusion scheme for sentiment analysis of spoken sentences.  ...  Acknowledgements We would like to thank our anonymous reviewers for their feedback. Special thanks to Nikos Athanasiou and Georgios Paraskevopoulos for their suggestions.  ... 
doi:10.21437/interspeech.2019-3243 dblp:conf/interspeech/GeorgiouPP19 fatcat:4o5723d6l5c7jadb7em24vixvu

Multimodal Data Fusion based on the Global Workspace Theory

Cong Bao, Zafeirios Fountas, Temitayo Olugbade, Nadia Bianchi-Berthouze
2020 Proceedings of the 2020 International Conference on Multimodal Interaction  
We propose a novel neural network architecture, named the Global Workspace Network (GWN), which addresses the challenge of dynamic and unspecified uncertainties in multimodal data fusion.  ...  based on the multimodal EmoPain dataset captured from people with chronic pain and healthy people performing different types of exercise movements in unconstrained settings.  ...  The memory fusion network of [59] which is based on a cross-modality attention module with a memory is one of such rare cases.  ... 
doi:10.1145/3382507.3418849 dblp:conf/icmi/BaoFOB20 fatcat:uwqvhz72nzaatnxmvvlprwx7ea

Deep Multimodal Emotion Recognition on Human Speech: A Review

Panagiotis Koromilas, Theodoros Giannakopoulos
2021 Applied Sciences  
, although in one of the unimodal or multimodal interactions; and (iii) temporal architectures (TA), which try to capture both unimodal and cross-modal temporal dependencies.  ...  Finally, we conclude this work with an in-depth analysis of the future challenges related to validation procedures, representation learning and method robustness.  ...  The memory component of the memory fusion network (MFN) is replaced with DFG to form the graph memory fusion network (Graph-MFN).  ... 
doi:10.3390/app11177962 fatcat:cezjfmjmvbgapo3tdz5j3iecp4

Multimodal Emotion Recognition from Art Using Sequential Co-Attention

Tsegaye Misikir Tashu, Sakina Hajiyeva, Tomas Horvath
2021 Journal of Imaging  
In this study, we present a multimodal emotion recognition architecture that uses both feature-level attention (sequential co-attention) and modality attention (weighted modality fusion) to classify emotion  ...  The proposed architecture helps the model to focus on learning informative and refined representations for both feature extraction and modality fusion.  ...  The Proposed Sequential Multimodal Fusion Model Figure 1 shows the architecture of our sequential attention-based multimodal model with weighted fusion approach.  ... 
doi:10.3390/jimaging7080157 pmid:34460793 pmcid:PMC8404915 fatcat:xktlzcr2zbc4rdb6tlk6uhh46i

Multimodal Data Fusion based on the Global Workspace Theory [article]

Cong Bao, Zafeirios Fountas, Temitayo Olugbade, Nadia Bianchi-Berthouze
2020 arXiv   pre-print
We propose a novel neural network architecture, named the Global Workspace Network (GWN), which addresses the challenge of dynamic and unspecified uncertainties in multimodal data fusion.  ...  based on the multimodal EmoPain dataset captured from people with chronic pain and healthy people performing different types of exercise movements in unconstrained settings.  ...  The memory fusion network of (Zadeh et al., 2018) which is based on a cross-modality attention module with a memory is one of such rare cases.  ... 
arXiv:2001.09485v2 fatcat:guzzqzj5ijhallygi4d2i5gumq

Unpaired Image-to-Speech Synthesis With Multimodal Information Bottleneck

Shuang Ma, Daniel Mcduff, Yale Song
2019 2019 IEEE/CVF International Conference on Computer Vision (ICCV)  
We propose a multimodal information bottleneck approach that learns the correspondence between modalities from unpaired data (image and speech) by leveraging the shared modality (text).  ...  We address fundamental challenges of skipmodal generation: 1) learning multimodal representations using a single model, 2) bridging the domain gap between two unrelated datasets, and 3) learning the correspondence  ...  While this would help the network learn to bottleneck superfluous information present within each modality, it would miss out on the opportunity to learn cross-modal correspondences.  ... 
doi:10.1109/iccv.2019.00769 dblp:conf/iccv/MaMS19 fatcat:zij6abqoinghpk2324ovd5imoi

HAMLET: A Hierarchical Multimodal Attention-based Human Activity Recognition Algorithm [article]

Md Mofijul Islam, Tariq Iqbal
2020 arXiv   pre-print
Although modern robots are equipped with various sensors, robust human activity recognition (HAR) still remains a challenging task for robots due to difficulties related to multimodal data fusion.  ...  Finally, multimodal features are used in a fully connect neural-network to recognize human activities.  ...  Deep learning-based feature representation architectures, especially convolutional neural networks (CNNs) and longshort-term memory (LSTM), have been widely adopted to encode the spatio-temporal features  ... 
arXiv:2008.01148v1 fatcat:hjh2z5cp7faxxkalynflahnd5y

Multimodal Classification: Current Landscape, Taxonomy and Future Directions [article]

William C. Sleeman IV, Rishabh Kapoor, Preetam Ghosh
2021 arXiv   pre-print
We address these challenges by proposing a new taxonomy for describing such systems based on trends found in recent publications on multimodal classification.  ...  Many of the most difficult aspects of unimodal classification have not yet been fully addressed for multimodal datasets including big data, class imbalance, and instance level difficulty.  ...  Cross-modality fusion allows for the sharing of modality specific data before or during the primary learning stage.  ... 
arXiv:2109.09020v1 fatcat:yagsbnxeefcpneqwgflrxxioqa

Unpaired Image-to-Speech Synthesis with Multimodal Information Bottleneck [article]

Shuang Ma, Daniel McDuff, Yale Song
2019 arXiv   pre-print
We address fundamental challenges of skip-modal generation: 1) learning multimodal representations using a single model, 2) bridging the domain gap between two unrelated datasets, and 3) learning the correspondence  ...  We propose a multimodal information bottleneck approach that learns the correspondence between modalities from unpaired data (image and speech) by leveraging the shared modality (text).  ...  While this would help the network learn to bottleneck superfluous information present within each modality, it would miss out on the opportunity to learn cross-modal correspondences.  ... 
arXiv:1908.07094v1 fatcat:keax7azkhjgj5nenttwvcnv45q

Multimodal Classification: Current Landscape, Taxonomy and Future Directions

William C. Sleeman Iv, Rishabh Kapoor, Preetam Ghosh
2022 ACM Computing Surveys  
Multimodal classification research has been gaining popularity with new datasets in domains such as satellite imagery, biometrics, and medicine.  ...  We address these challenges by proposing a new taxonomy for describing multimodal classification models based on trends found in recent publications.  ...  This cross-modality fusion architecture is often also used with deep belief network (DBN) or autoencoder style networks [25, 32, 59] .  ... 
doi:10.1145/3543848 fatcat:ejigpgm5gnabvc4jrb3nml5l4y

Multimodality in Meta-Learning: A Comprehensive Survey [article]

Yao Ma, Shilin Zhao, Weixiao Wang, Yaoman Li, Irwin King
2022 arXiv   pre-print
We first formalize the definition of meta-learning in multimodality, along with the research challenges in this growing field, such as how to enrich the input in few-shot learning (FSL) or zero-shot learning  ...  Finally, we propose potential research directions for this promising field.  ...  Figure 7 : 7 Figure 7: Network structures for cross-modal augmentation. Left: Adapted from [81, 74]. Meta-learning with hallucination.  ... 
arXiv:2109.13576v2 fatcat:khofp6ldxrcafa7iyk7g6pxchi

Multimodal Deep Learning for Activity and Context Recognition

Valentin Radu, Catherine Tong, Sourav Bhattacharya, Nicholas D. Lane, Cecilia Mascolo, Mahesh K. Marina, Fahim Kawsar
2018 Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies  
The second, a novel alternative that we term Modality-Specific Architecture (MA) is a deep learning specific technique that places emphasis on learning both intra-modality and cross-modality relations.  ...  For each dataset, we evaluate: (i) four deep learning techniques based on FC and MA, with Deep Neural Networks (DNN) and Convolutional Neural Networks (CNNs) as base classifiers; (ii) two shallow classifier  ...  (FC) with concatenated inputs from multiple sensing modalities, and Modality-Specific Architecture (MA), with sensorspecific branches for each modality before fusion is achieved later in the network.  ... 
doi:10.1145/3161174 fatcat:dvp6jljcx5a23lw4knaimlcv3m

Cross-Modal Self-Attention Network for Referring Image Segmentation [article]

Linwei Ye, Mrigank Rochan, Zhi Liu, Yang Wang
2019 arXiv   pre-print
In addition, we propose a gated multi-level fusion module to selectively integrate self-attentive cross-modal features corresponding to different levels in the image.  ...  In this paper, we propose a cross-modal self-attention (CMSA) module that effectively captures the long-range dependencies between linguistic and visual features.  ...  Thanks to NVIDIA for donating some of the GPUs used in this work.  ... 
arXiv:1904.04745v1 fatcat:x2yi54nxyffmni7gthqnls4x7a

New Ideas and Trends in Deep Multimodal Content Understanding: A Review [article]

Wei Chen and Weiping Wang and Li Liu and Michael S. Lew
2020 arXiv   pre-print
The focus of this survey is on the analysis of two modalities of multimodal deep learning: image and text.  ...  These models go beyond the simple image classifiers in which they can do uni-directional (e.g. image captioning, image generation) and bi-directional (e.g. cross-modal retrieval, visual question answering  ...  As reported in Figure 4 , memory-augmented networks are used in cross-modal retrieval [ [150] . We illustrate memory-augmented networks for multimodal learning in Figure 6 .  ... 
arXiv:2010.08189v1 fatcat:2l7molbcn5hf3oyhe3l52tdwra
« Previous Showing results 1 — 15 out of 1,889 results