Filters








205 Hits in 14.0 sec

A Review on Methods and Applications in Multimodal Deep Learning [article]

Jabeen Summaira, Xi Li, Amin Muhammad Shoib, Jabbar Abdul
2022 arXiv   pre-print
Despite the extensive development made for unimodal learning, it still cannot cover all the aspects of human learning.  ...  Lastly, main issues are highlighted separately for each domain, along with their possible future research directions.  ...  extracts better event representations * Additionally, temporal attention enhance description process by focusing on temporal frame features while words captioning mainly two types of modalities are used  ... 
arXiv:2202.09195v1 fatcat:wwxrmrwmerfabbenleylwmmj7y

A Systematic Review on Affective Computing: Emotion Models, Databases, and Recent Advances [article]

Yan Wang, Wei Song, Wei Tao, Antonio Liotta, Dawei Yang, Xinlei Li, Shuyong Gao, Yixuan Sun, Weifeng Ge, Wei Zhang, Wenqiang Zhang
2022 arXiv   pre-print
Firstly, we introduce two typical emotion models followed by commonly used databases for affective computing.  ...  ., emotion recognition and sentiment analysis).  ...  [152] proposed an attention-based CNN-RNN deep model (ABCDM), which utilized bidirectional LSTM and GRU layers to capture temporal contexts and apply the attention operations on the discriminative embeddings  ... 
arXiv:2203.06935v3 fatcat:h4t3omkzjvcejn2kpvxns7n2qe

RGB-D-based Human Motion Recognition with Deep Learning: A Survey [article]

Pichao Wang and Wanqing Li and Philip Ogunbona and Jun Wan and Sergio Escalera
2018 arXiv   pre-print
The reviewed methods are broadly categorized into four groups, depending on the modality adopted for recognition: RGB-based, depth-based, skeleton-based and RGB+D-based.  ...  Specifically, deep learning methods based on the CNN and RNN architectures have been adopted for motion recognition using RGB-D data.  ...  [111] proposed a soft attention model for action recognition based on LSTM (see Figure.  ... 
arXiv:1711.08362v2 fatcat:cugugpqeffcshnwwto4z2aw4ti

Visual Sensation and Perception Computational Models for Deep Learning: State of the art, Challenges and Prospects [article]

Bing Wei, Yudi Zhao, Kuangrong Hao, Lei Gao
2021 arXiv   pre-print
Through this survey, it will provide a comprehensive reference for research in this direction.  ...  Computational models inspired by visual perception have the characteristics of complexity and diversity, as they come from many subjects such as cognition science, information science, and artificial intelligence  ...  [54] designed an attention-based bidirectional CNN-RNN deep model (ABCDM) for extrating both the termporal information and the attention information by using bidirectional LSTM and gate recurrent unit  ... 
arXiv:2109.03391v1 fatcat:xtgda2x6azd2laun45tqfj77gi

Multi-Modal Sequence Fusion via Recursive Attention for Emotion Recognition

Rory Beard, Ritwik Das, Raymond W. M. Ng, P. G. Keerthana Gopalakrishnan, Luka Eerens, Pawel Swietojanski, Ondrej Miksik
2018 Proceedings of the 22nd Conference on Computational Natural Language Learning  
Inspired by the empirical success of recent so-called End-To-End Memory Networks (Sukhbaatar et al., 2015), we propose an approach based on recursive multi-attention with a shared external memory updated  ...  Natural human communication is nuanced and inherently multi-modal.  ...  In this paper, we address multi-modal sequence fusion for automatic emotion recognition.  ... 
doi:10.18653/v1/k18-1025 dblp:conf/conll/BeardDNGESM18 fatcat:rqgrht7sgnhsvm4bsqlwgmnhqy

Multimodal Research in Vision and Language: A Review of Current and Emerging Trends [article]

Shagun Uppal, Sarthak Bhagat, Devamanyu Hazarika, Navonil Majumdar, Soujanya Poria, Roger Zimmermann, Amir Zadeh
2020 arXiv   pre-print
In this paper, we present a detailed overview of the latest trends in research pertaining to visual and language modalities.  ...  Moreover, we shed some light on multi-disciplinary patterns and insights that have emerged in the recent past, directing this field towards more modular and transparent intelligent systems.  ...  This has been further explored with bidirectional matching [145] benefitting in end-to-end frameworks with attention based settings.  ... 
arXiv:2010.09522v2 fatcat:l4npstkoqndhzn6hznr7eeys4u

Deep Audio-visual Learning: A Survey

Hao Zhu, Man-Di Luo, Rui Wang, Ai-Hua Zheng, Ran He
2021 International Journal of Automation and Computing  
Researchers tend to leverage these two modalities to improve the performance of previously considered single-modality tasks or address new challenging problems.  ...  AbstractAudio-visual learning, aimed at exploiting the relationship between audio and visual modalities, has drawn considerable attention since deep learning started to be used successfully.  ...  ., mutual information, temporal information) or adjust the structure of the network such as the use of RNN and LSTM, increasing the modal structure or input pretreatment, etc., to obtain better representation  ... 
doi:10.1007/s11633-021-1293-0 fatcat:an5lfyf4m5fh7mlngmdcbx7joy

Deep Learning for Action and Gesture Recognition in Image Sequences: A Survey [chapter]

Maryam Asadi-Aghbolaghi, Albert Clapés, Marco Bellantonio, Hugo Jair Escalante, Víctor Ponce-López, Xavier Baró, Isabelle Guyon, Shohreh Kasaei, Sergio Escalera
2017 Gesture Recognition  
A survey on deep learning based approaches for action and gesture recognition in image sequences.  ...  This chapter is a survey of current deep learning based methodologies for action and gesture recognition in sequences of images.  ...  Hugo Jair Escalante was supported by CONACyT under grants CB2014-241306 and PN-215546.  ... 
doi:10.1007/978-3-319-57021-1_19 fatcat:d2m5oyomsjhkbfpunhefho6ayq

Deep Learning for Sentiment Analysis : A Survey [article]

Lei Zhang, Shuai Wang, Bing Liu
2018 arXiv   pre-print
Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of the data and produces state-of-the-art prediction results.  ...  Acknowledgments Bing Liu and Shuai Wang's work was supported in part by National Science Foundation (NSF) under grant no. IIS1407927 and IIS-1650900, and by Huawei Technologies Co.  ...  Poria et al. 134 proposed some a deep learning model for multi-modal sentiment analysis and emotion recognition on video data.  ... 
arXiv:1801.07883v2 fatcat:nplicfgaozb6fbfx4eyts4zt7e

Sentiment analysis using deep learning approaches: an overview

Olivier Habimana, Yuhua Li, Ruixuan Li, Xiwu Gu, Ge Yu
2019 Science China Information Sciences  
Suggestions include the use of bidirectional encoder representations from transformers (BERT), sentiment-specific word embedding models, cognition-based attention models, common sense knowledge, reinforcement  ...  The main benefit of machine learning approaches is their ability of representation learning. Pang et al. [8] pioneered the use of these techniques for sentiment analysis.  ...  [114] constructed a model titled GME-LSTM(A), which integrates gated multi-modal embedding (GME) and LSTM with temporal attention (LSTM(A) ).  ... 
doi:10.1007/s11432-018-9941-6 fatcat:nbevrfiyybhszirol2af26c6ve

Deep Graph Fusion based Multimodal Evoked Expressions from Large-Scale Videos

Ngoc-Huynh Ho, Hyung-Jeong Yang, Soo-Hyung Kim, Gueesang Lee, Seok-Bong Yoo
2021 IEEE Access  
To begin, we extract features for each 30-second segment's visual and auditory modalities using CNN-based pre-trained models to understand their salient representations.  ...  As a result, in this paper, we propose a hybrid fusion model termed deep graph fusion for predicting viewers' elicited expressions from videos by leveraging the combination of visual-audio representations  ...  . • Attention-based LSTM (long-short term memory): We conduct same feature extraction in Section III-A, and apply attention-based LSMT to learn temporal information for a 30-second sequence.  ... 
doi:10.1109/access.2021.3107548 fatcat:4lpfqirhlrb5bghdkybjjgdvse

Deep Audio-Visual Learning: A Survey [article]

Hao Zhu, Mandi Luo, Rui Wang, Aihua Zheng, Ran He
2020 arXiv   pre-print
Researchers tend to leverage these two modalities either to improve the performance of previously considered single-modality tasks or to address new challenging problems.  ...  Audio-visual learning, aimed at exploiting the relationship between audio and visual modalities, has drawn considerable attention since deep learning started to be used successfully.  ...  Furthermore, temporal information was extracted by a bidirectional LSTM.  ... 
arXiv:2001.04758v1 fatcat:p6ph5cujl5do3pzlpvcce35nvi

Recent Advances in Recurrent Neural Networks [article]

Hojjat Salehinejad, Sharan Sankar, Joseph Barfett, Errol Colak, Shahrokh Valaee
2018 arXiv   pre-print
A well-trained RNN can model any dynamical system; however, training RNNs is mostly plagued by issues in learning long-term dependencies.  ...  In this paper, we present a survey on RNNs and several new advances for newcomers and professionals in the field.  ...  using RNNs 2005 Graves BLSTM: Bidirectional LSTM 2007 Jaeger Leaky integration neurons 2007 Graves MDRNN: Multi-dimensional RNNs 2009 Graves LSTM for hand-writing recognition 2010 Mikolov  ... 
arXiv:1801.01078v3 fatcat:ioxziqbkmzdrfoh2kukul6xlku

Two-level Attention with Two-stage Multi-task Learning for Facial Emotion Recognition [article]

Xiaohua Wang, Muzi Peng, Lijuan Pan, Min Hu, Chunhua Jin, Fuji Ren
2018 arXiv   pre-print
In this paper, a two-level attention with two-stage multi-task learning (2Att-2Mt) framework is proposed for facial emotion estimation on only static images.  ...  Owing to the inherent complexity of dimensional emotion recognition, we propose a two-stage multi-task learning structure to exploited categorical representations to ameliorate the dimensional representations  ...  Acknowledgments This research has been partially supported by Na-  ... 
arXiv:1811.12139v1 fatcat:wtvre6qd6nfpno2dbxahcz65qi

RGB-D Data-Based Action Recognition: A Review

Muhammad Bilal Shaikh, Douglas Chai
2021 Sensors  
Naturally, each action-data modality—such as RGB, depth, skeleton, and infrared (IR)—has distinct characteristics; therefore, it is important to exploit the value of each modality for better action recognition  ...  We conclude by discussing research challenges, emerging trends, and possible future research directions.  ...  Acknowledgments: The authors would like to thank the anonymous reviewers for their careful reading and valuable remarks, which have greatly helped extend the scope of this paper.  ... 
doi:10.3390/s21124246 fatcat:7dvocdy63rckne5yunhfsnr4p4
« Previous Showing results 1 — 15 out of 205 results