Filters








185 Hits in 4.0 sec

Multimodal Co-learning: Challenges, Applications with Datasets, Recent Advances and Future Directions [article]

Anil Rahate, Rahee Walambe, Sheela Ramanna, Ketan Kotecha
2021 arXiv   pre-print
Multimodal deep learning systems which employ multiple modalities like text, image, audio, video, etc., are showing better performance in comparison with individual modalities (i.e., unimodal) systems.  ...  Our final goal is to discuss challenges and perspectives along with the important ideas and directions for future work that we hope to be beneficial for the entire research community focusing on this exciting  ...  Knowledge distillation uses teacherstudent network and privileged information learning [46] to have a multimodal distillation network for video action recognition using RGB and depth modality.  ... 
arXiv:2107.13782v2 fatcat:s4spofwxjndb7leqbcqnwbifq4

Multi-Layered Multimodal Biometric Authentication for Smartphone Devices

Qurban A Memon
2020 International Journal of Interactive Mobile Technologies  
In current literature, multimodal biometric approach is addressed at length for purpose of improving secured access into personal devices.  ...  In this paper, a multilayered multimodal biometric approach using three biometric methods (such as finger print, face and voice) is proposed for smartphones.  ...  In this paper, a multi-step/multi-layered multimodal biometric for smartphone devices is presented to investigate authentication with full or partial privileges depending on work environments.  ... 
doi:10.3991/ijim.v14i15.15825 fatcat:avy573ktqvd4rayqrb7k2hru34

Towards Multimodal Sarcasm Detection (An _Obviously_ Perfect Paper)

Santiago Castro, Devamanyu Hazarika, Verónica Pérez-Rosas, Roger Zimmermann, Rada Mihalcea, Soujanya Poria
2019 Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics  
The full dataset is publicly available for use at https://github. com/soujanyaporia/MUStARD. * Equal contribution. 1 MUStARD is an abbreviation for MUltimodal SARcasm Dataset.  ...  MUStARD consists of audiovisual utterances annotated with sarcasm labels.  ...  Acknowledgements We are grateful to Gautam Naik for his help in curating part of the dataset from online resources.  ... 
doi:10.18653/v1/p19-1455 dblp:conf/acl/CastroHPZMP19 fatcat:q7j7t5y43rf3ljevh5avfmgpjm

MultiBench: Multiscale Benchmarks for Multimodal Representation Learning [article]

Paul Pu Liang, Yiwei Lyu, Xiang Fan, Zetian Wu, Yun Cheng, Jason Wu, Leslie Chen, Peter Wu, Michelle A. Lee, Yuke Zhu, Ruslan Salakhutdinov, Louis-Philippe Morency
2021 arXiv   pre-print
Learning multimodal representations involves integrating information from multiple heterogeneous sources of data.  ...  MultiBench introduces impactful challenges for future research, including scalability to large-scale multimodal datasets and robustness to realistic imperfections.  ...  Multimodal models outperform unimodal models when it comes to robustness (and initial performance). This is especially true for imperfections in the image modality.  ... 
arXiv:2107.07502v2 fatcat:ls47dr7lpfhkbfry4r6dtqjtua

MultiViz: An Analysis Benchmark for Visualizing and Understanding Multimodal Models [article]

Paul Pu Liang, Yiwei Lyu, Gunjan Chhablani, Nihal Jain, Zihao Deng, Xingbo Wang, Louis-Philippe Morency, Ruslan Salakhutdinov
2022 arXiv   pre-print
Our paper aims to fill this gap by proposing MultiViz, a method for analyzing the behavior of multimodal models by scaffolding the problem of interpretability into 4 stages: (1) unimodal importance: how  ...  The promise of multimodal models for real-world applications has inspired research in visualizing and understanding their internal mechanics with the end goal of empowering stakeholders to visualize model  ...  PPL is partially supported by a Facebook PhD Fellowship and a Carnegie Mellon University's Center for Machine Learning and Health Fellowship.  ... 
arXiv:2207.00056v1 fatcat:vxg2lcvm6jgghldw74b7onwjje

Towards A Multi-agent System for Online Hate Speech Detection [article]

Gaurav Sahu, Robin Cohen, Olga Vechtomova
2021 arXiv   pre-print
We introduce a novel framework employing deep learning techniques to coordinate the channels of textual and im-age processing.  ...  This paper envisions a multi-agent system for detecting the presence of hate speech in online social media platforms such as Twitter and Facebook.  ...  For binary classification experiments, we first observe that unimodal BiL is highly competitive with the multimodal FCM, SCM, and TKM baselines.  ... 
arXiv:2105.01129v1 fatcat:lakkm66thrfy3kidfc734cfjbe

A review of speech-based bimodal recognition

C.C. Chibelushi, F. Deravi, J.S.D. Mason
2002 IEEE transactions on multimedia  
Multimodal recognition is therefore acknowledged as a vital component of the next generation of spoken language systems.  ...  Speech recognition and speaker recognition by machine are crucial ingredients for many important applications such as natural and flexible human-machine interfaces.  ...  Also, to combat data variability, the symbiotic combination of sensor fusion with mature techniques developed for robust unimodal recognition, is also a worthwhile research avenue.  ... 
doi:10.1109/6046.985551 fatcat:6fezo5zovbdtti3lzxh24ksaii

Multisensory perception as an associative learning process

Kevin Connolly
2014 Frontiers in Psychology  
A third reason the debate is important is that a view that makes multimodal perception a flexible, learned process (see, for instance, Connolly, 2014) fits more naturally with the emerging view of perception  ...  Say, for instance, that the target set is a set of two horizontal and one vertical line segments.  ... 
doi:10.3389/fpsyg.2014.01095 pmid:25309498 pmcid:PMC4176039 fatcat:i2ywyzadcbahvgvikfvp5mr6ja

RadFusion: Benchmarking Performance and Fairness for Multimodal Pulmonary Embolism Detection from CT and EHR [article]

Yuyin Zhou, Shih-Cheng Huang, Jason Alan Fries, Alaa Youssef, Timothy J. Amrhein, Marcello Chang, Imon Banerjee, Daniel Rubin, Lei Xing, Nigam Shah, Matthew P. Lungren
2021 arXiv   pre-print
imaging are unimodal, i.e., they only learn features from pixel-level information.  ...  To better assess these challenges, we present RadFusion, a multimodal, benchmark dataset of 1794 patients with corresponding EHR data and high-resolution computed tomography (CT) scans labeled for pulmonary  ...  in TPR (i.e., Sensitivity) for the privileged and under-privileged groups following the evaluation protocol in [19, 46, 37] .  ... 
arXiv:2111.11665v2 fatcat:jtbpdwnq6jb2znuw6qucfjf3r4

Multimodal Future Localization and Emergence Prediction for Objects in Egocentric View with a Reachability Prior [article]

Osama Makansi, Özgün Cicek, Kevin Buchicchio, Thomas Brox
2020 arXiv   pre-print
Experiments show that the reachability prior combined with multi-hypotheses learning improves multimodal prediction of the future location of tracked objects and, for the first time, the emergence of new  ...  In contrast to many previous works, we do not assume structural knowledge from maps.  ...  Hence, a multimodal method is favored over a unimodal one. Still, such significant improvement indicates the need for multimodality.  ... 
arXiv:2006.04700v1 fatcat:2arah4qiybdaro63tykxw24izy

Context-based recognition during human interactions

Louis-Philippe Morency, Iwan de Kok, Jonathan Gratch
2008 Proceedings of the 10th international conference on Multimodal interfaces - IMCI '08  
For example, in a dyadic interaction, the speaker contextual cues such as gaze shifts or changes in prosody will influence listener backchannel feedback (e.g., head nod).  ...  Multimodal integration between context and visual observations is performed using a discriminative sequential model (Latent-Dynamic Conditional Random Fields) trained on previous interactions.  ...  In every case, no knowledge of the future is needed.  ... 
doi:10.1145/1452392.1452426 dblp:conf/icmi/MorencyKG08 fatcat:4pwhgrrslnez3emvdgyhhggppy

Somesthetic, Visual, and Auditory Feedback and Their Interactions Applied to Upper Limb Neurorehabilitation Technology: A Narrative Review to Facilitate Contextualization of Knowledge

Camille E. Proulx, Manouchka T. Louis Jean, Johanne Higgins, Dany H. Gagnon, Numa Dancause
2022 Frontiers in Rehabilitation Sciences  
This would better align with the new trend in stroke rehabilitation which challenges the popular idea of the existence of an ultimate good-for-all intervention.  ...  Reduced hand dexterity is a common component of sensorimotor impairments for individuals after stroke.  ...  privileged over a unimodal approach.  ... 
doi:10.3389/fresc.2022.789479 fatcat:fb2noy4rlndzpapd46humlirsm

From sensation to cognition

M. Mesulam
1998 Brain  
The unique role of these areas is to bind multiple unimodal and other transmodal areas into distributed but integrated multimodal representations.  ...  This process occurs along a core synaptic hierarchy which includes the primary sensory, upstream unimodal, downstream unimodal, heteromodal, paralimbic and limbic zones of the cerebral cortex.  ...  Nobre for detailed and insightful comments. This work was supported in part by NS 20285, NS 30863 and AG 13854.  ... 
doi:10.1093/brain/121.6.1013 pmid:9648540 fatcat:xv6jy7ff7nfpjb4ruzrhqwzfym

2021 Index IEEE Transactions on Multimedia Vol. 23

2021 IEEE transactions on multimedia  
The Author Index contains the primary entry for each item, listed under the first author's name.  ...  Zuo, Y., +, TMM 2021 772-783 Hard Pixel Mining for Depth Privileged Semantic Segmentation.  ...  Liu, Z., +, TMM 2021 3414-3426 Data mining Hard Pixel Mining for Depth Privileged Semantic Segmentation.  ... 
doi:10.1109/tmm.2022.3141947 fatcat:lil2nf3vd5ehbfgtslulu7y3lq

Multimodal Semantic Segmentation in Autonomous Driving: A Review of Current Approaches and Future Perspectives

Giulia Rizzoli, Francesco Barbato, Pietro Zanuttigh
2022 Technologies  
Then we review several different deep learning architectures for multimodal semantic segmentation.  ...  us to improve performances with respect to the exploitation of a single source of information.  ...  Outline In this paper, we focus on analyzing and discussing deep learning based fusion methods in multimodal semantic segmentation.  ... 
doi:10.3390/technologies10040090 fatcat:zfalhwsz3ndrnog62l22ucqc7e
« Previous Showing results 1 — 15 out of 185 results