Filters








361 Hits in 9.4 sec

Gated spatio and temporal convolutional neural network for activity recognition: towards gated multimodal deep learning

Novanto Yudistira, Takio Kurita
2017 EURASIP Journal on Image and Video Processing  
A mixture of experts via a gating Convolutional Neural Network (CNN) is one promising architecture for adaptively weighting every sample within a dataset.  ...  Human activity recognition requires both visual and temporal cues, making it challenging to integrate these important modalities.  ...  We thank Kim Moravec, PhD, from Edanz Group (www.edanzediting.com/ac) for editing a draft of this manuscript.  ... 
doi:10.1186/s13640-017-0235-9 fatcat:fjt34uls6zdbvi3cjqsocb6wdm

RGB-D Data-Based Action Recognition: A Review

Muhammad Bilal Shaikh, Douglas Chai
2021 Sensors  
The increase in the number of action recognition datasets intersects with advances in deep learning architectures and computational support, both of which offer significant research opportunities.  ...  Naturally, each action-data modality—such as RGB, depth, skeleton, and infrared (IR)—has distinct characteristics; therefore, it is important to exploit the value of each modality for better action recognition  ...  Figure 5 . 5 Illustration of deep learning techniques for processing RGB-D data. (a) Convolutional Neural Network (CNN). (b) Long Short-Term Memory (LSTM). (c) Graph Convolutional Network (GCN).  ... 
doi:10.3390/s21124246 fatcat:7dvocdy63rckne5yunhfsnr4p4

Differential Recurrent Neural Networks for Action Recognition [article]

Vivek Veeriah and Naifan Zhuang and Guo-Jun Qi
2015 arXiv   pre-print
The long short-term memory (LSTM) neural network is capable of processing complex sequential information since it utilizes special gating schemes for learning representations from long input sequences.  ...  To address this problem, we propose a differential gating scheme for the LSTM neural network, which emphasizes on the change in information gain caused by the salient motions between the successive frames  ...  Recently, the huge success of deep networks in image classification [16] and speech recognition [9] has inspired many researchers to apply the deep neural networks, such as 3D Convolutional Neural  ... 
arXiv:1504.06678v1 fatcat:enbm3tvfyrfzbbbksskxipak3i

Associated Spatio-Temporal Capsule Network for Gait Recognition [article]

Aite Zhao, Junyu Dong, Jianbo Li, Lin Qi, Huiyu Zhou
2021 arXiv   pre-print
Therefore, we here establish an automated learning system, with an associated spatio-temporal capsule network (ASTCapsNet) trained on multi-sensor datasets, to analyze multimodal information for gait recognition  ...  Specifically, we first design a low-level feature extractor and a high-level feature extractor for spatio-temporal feature extraction of gait with a novel recurrent memory unit and a relationship layer  ...  neural network for gait analysis and recognition. • DCLSTM [19] (dual-channel LSTM) is a temporal LSTM model for multimodal gait recognition. • Q-BTDNN [38] (Q-backpropagated time-delay neural network  ... 
arXiv:2101.02458v1 fatcat:rr7fnv3okrc5thxmsqtdwoly3i

HMS: Hierarchical Modality Selection for Efficient Video Recognition [article]

Zejia Weng, Zuxuan Wu, Hengduo Li, Yu-Gang Jiang
2021 arXiv   pre-print
This paper introduces Hierarchical Modality Selection (HMS), a simple yet efficient multimodal learning framework for efficient video recognition.  ...  Videos are multimodal in nature. Conventional video recognition pipelines typically fuse multimodal features for improved performance.  ...  3D convolutional layers to 2D networks to learn spatio-temporal features jointly.  ... 
arXiv:2104.09760v2 fatcat:js2whnimvvbhfp5uzenqu3mlvq

Differential Recurrent Neural Networks for Action Recognition

Vivek Veeriah, Naifan Zhuang, Guo-Jun Qi
2015 2015 IEEE International Conference on Computer Vision (ICCV)  
The long short-term memory (LSTM) neural network is capable of processing complex sequential information since it utilizes special gating schemes for learning representations from long input sequences.  ...  To address this problem, we propose a differential gating scheme for the L-STM neural network, which emphasizes on the change in information gain caused by the salient motions between the successive frames  ...  Recently, the huge success of deep networks in image classification [18] and speech recognition [11] has inspired many researchers to apply the deep neural networks, such as 3D Convolutional Neural  ... 
doi:10.1109/iccv.2015.460 dblp:conf/iccv/VeeriahZQ15 fatcat:eg35qye7lnai7g5s42czp6pxeq

A Hierarchical Deep Temporal Model for Group Activity Recognition

Mostafa S. Ibrahim, Srikanth Muralidharan, Zhiwei Deng, Arash Vahdat, Greg Mori
2016 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)  
To make use of these observations, we present a 2-stage deep temporal model for the group activity recognition problem.  ...  In group activity recognition, the temporal dynamics of the whole activity can be inferred based on the dynamics of the individual people representing the activity.  ...  Acknowledgements This work was supported by grants from NSERC and Disney Research.  ... 
doi:10.1109/cvpr.2016.217 dblp:conf/cvpr/IbrahimMDVM16 fatcat:m6z36t23wjb5hcmv4ucm32uamq

Human Action Recognition from Various Data Modalities: A Review [article]

Zehua Sun, Qiuhong Ke, Hossein Rahmani, Mohammed Bennamoun, Gang Wang, Jun Liu
2021 arXiv   pre-print
Specifically, we review the current mainstream deep learning methods for single data modalities and multiple data modalities, including the fusion-based and the co-learning-based frameworks.  ...  In this paper, we present a comprehensive survey of recent progress in deep learning methods for HAR based on the type of input data modality.  ...  spatio-temporal feature gating to enhance HAR.  ... 
arXiv:2012.11866v4 fatcat:twjnaur2jzahznci6clkadylay

A Hierarchical Deep Temporal Model for Group Activity Recognition [article]

Moustafa Ibrahim, Srikanth Muralidharan, Zhiwei Deng, Arash Vahdat, Greg Mori
2016 arXiv   pre-print
To make use of these ob- servations, we present a 2-stage deep temporal model for the group activity recognition problem.  ...  In group activity recognition, the temporal dynamics of the whole activity can be inferred based on the dynamics of the individual people representing the activity.  ...  Acknowledgements This work was supported by grants from NSERC and Disney Research.  ... 
arXiv:1511.06040v2 fatcat:danywqdq7beandbkdaagv2fw3i

Weakly-Supervised Action Localization and Action Recognition using Global-Local Attention of 3D CNN [article]

Novanto Yudistira, Muthu Subash Kavitha, Takio Kurita
2021 arXiv   pre-print
3D Convolutional Neural Network (3D CNN) captures spatial and temporal information on 3D data such as video sequences.  ...  ii) implement attention gating network to improve the accuracy of the action recognition.  ...  ACKNOWLEDGMENT The authors would like to thank KAKENHI project no. 16K00239 for funding the research.  ... 
arXiv:2012.09542v2 fatcat:kph25ge5hzfl5gugazp5lvzdxm

Sign Language Recognition Analysis using Multimodal Data [article]

Al Amin Hosain, Panneer Selvam Santhalingam, Parth Pathak, Jana Kosecka, Huzefa Rangwala
2019 arXiv   pre-print
In this work, we investigate the feasibility of using skeletal and RGB video data for sign language recognition using a combination of different deep learning architectures.  ...  With the advancement of depth sensors, skeletal data is used for applications like video analysis and activity recognition.  ...  for digital assistants. 2 Index Terms-neural networks, deep learning, modality-fusion, sign language recognition I.  ... 
arXiv:1909.11232v1 fatcat:owkrqtzc6ngrna5bdtsru26gqq

Deep Facial Expression Recognition: A Survey [article]

Shan Li, Weihong Deng
2018 arXiv   pre-print
With the transition of facial expression recognition (FER) from laboratory-controlled to challenging in-the-wild conditions and the recent success of deep learning techniques in various fields, deep neural  ...  For the state of the art in deep FER, we review existing novel deep neural networks and related training strategies that are designed for FER based on both static images and dynamic image sequences, and  ...  techniques for multimodal affect recognition.  ... 
arXiv:1804.08348v2 fatcat:katpvrizybha5bgy6bepfi3xpe

A Multi-scale Approach to Gesture Detection and Recognition

Natalia Neverova, Christian Wolf, Giulio Paci, Giacomo Sommavilla, Graham W. Taylor, Florian Nebout
2013 2013 IEEE International Conference on Computer Vision Workshops  
Finally, we employ a Recurrent Neural Network for modeling large-scale temporal dependencies, data fusion and ultimately gesture classification.  ...  Our experiments on the 2013 Challenge on Multimodal Gesture Recognition dataset have demonstrated that using multiple modalities at several spatial and temporal scales leads to a significant increase in  ...  Finally, we use extracted blocks for supervised training of a convolutional network [17] consisting of 2 convolutional layers with tanh activations and 2 sub-sampling layers (ConvNet in Fig. 2 ).  ... 
doi:10.1109/iccvw.2013.69 dblp:conf/iccvw/Neverova0PSTN13 fatcat:lerju4ym75bmjjinrwp74iaedi

Heterogeneous Non-Local Fusion for Multimodal Activity Recognition

Petr Byvshev, Pascal Mettes, Yu Xiao
2020 Proceedings of the 2020 International Conference on Multimedia Retrieval  
To further promote research into multimodal activity recognition, we introduce GloVid, a first-person activity dataset captured with video recordings and smart glove sensor readings.  ...  Here, we propose an activity network that fuses multimodal inputs coming from completely different and heterogeneous sensors. We frame such a heterogeneous fusion as a non-local operation.  ...  Within the last years, video recognition with deep spatio-temporal networks has become the dominant research direction.  ... 
doi:10.1145/3372278.3390675 dblp:conf/mir/ByvshevMX20 fatcat:5ot2q5pumrdydnnnwc6qqh27nu

Analysis of Deep Neural Networks For Human Activity Recognition in Videos – A Systematic Literature Review

Hadiqa Aman Ullah, Sukumar Letchmunan, M. Sultan Zia, Umair Muneer Butt, Fadratul Hafinaz Hassan
2021 IEEE Access  
The scope of this study is to learn the impact of deep neural architecture for Spatio-temporal feature extraction for improved activity classification.  ...  Deep learning techniques such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNN) are highly effective in learning complex activities due to their characteristics of local dependency  ...  She has previously served as a lecturer in the field of Computer Science and IT and has one journal publication before. Her research interests are Data Science, Machine learning, and Computer vision.  ... 
doi:10.1109/access.2021.3110610 fatcat:ussooxm7azfljpb5prsm7creaa
« Previous Showing results 1 — 15 out of 361 results