A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2018; you can also visit the original URL.
The file type is application/pdf
.
Filters
Gated spatio and temporal convolutional neural network for activity recognition: towards gated multimodal deep learning
2017
EURASIP Journal on Image and Video Processing
A mixture of experts via a gating Convolutional Neural Network (CNN) is one promising architecture for adaptively weighting every sample within a dataset. ...
Human activity recognition requires both visual and temporal cues, making it challenging to integrate these important modalities. ...
We thank Kim Moravec, PhD, from Edanz Group (www.edanzediting.com/ac) for editing a draft of this manuscript. ...
doi:10.1186/s13640-017-0235-9
fatcat:fjt34uls6zdbvi3cjqsocb6wdm
RGB-D Data-Based Action Recognition: A Review
2021
Sensors
The increase in the number of action recognition datasets intersects with advances in deep learning architectures and computational support, both of which offer significant research opportunities. ...
Naturally, each action-data modality—such as RGB, depth, skeleton, and infrared (IR)—has distinct characteristics; therefore, it is important to exploit the value of each modality for better action recognition ...
Figure 5 . 5 Illustration of deep learning techniques for processing RGB-D data. (a) Convolutional Neural Network (CNN). (b) Long Short-Term Memory (LSTM). (c) Graph Convolutional Network (GCN). ...
doi:10.3390/s21124246
fatcat:7dvocdy63rckne5yunhfsnr4p4
Differential Recurrent Neural Networks for Action Recognition
[article]
2015
arXiv
pre-print
The long short-term memory (LSTM) neural network is capable of processing complex sequential information since it utilizes special gating schemes for learning representations from long input sequences. ...
To address this problem, we propose a differential gating scheme for the LSTM neural network, which emphasizes on the change in information gain caused by the salient motions between the successive frames ...
Recently, the huge success of deep networks in image classification [16] and speech recognition [9] has inspired many researchers to apply the deep neural networks, such as 3D Convolutional Neural ...
arXiv:1504.06678v1
fatcat:enbm3tvfyrfzbbbksskxipak3i
Associated Spatio-Temporal Capsule Network for Gait Recognition
[article]
2021
arXiv
pre-print
Therefore, we here establish an automated learning system, with an associated spatio-temporal capsule network (ASTCapsNet) trained on multi-sensor datasets, to analyze multimodal information for gait recognition ...
Specifically, we first design a low-level feature extractor and a high-level feature extractor for spatio-temporal feature extraction of gait with a novel recurrent memory unit and a relationship layer ...
neural network for gait analysis and recognition. • DCLSTM [19] (dual-channel LSTM) is a temporal LSTM model for multimodal gait recognition. • Q-BTDNN [38] (Q-backpropagated time-delay neural
network ...
arXiv:2101.02458v1
fatcat:rr7fnv3okrc5thxmsqtdwoly3i
HMS: Hierarchical Modality Selection for Efficient Video Recognition
[article]
2021
arXiv
pre-print
This paper introduces Hierarchical Modality Selection (HMS), a simple yet efficient multimodal learning framework for efficient video recognition. ...
Videos are multimodal in nature. Conventional video recognition pipelines typically fuse multimodal features for improved performance. ...
3D convolutional layers to 2D networks to learn spatio-temporal features jointly. ...
arXiv:2104.09760v2
fatcat:js2whnimvvbhfp5uzenqu3mlvq
Differential Recurrent Neural Networks for Action Recognition
2015
2015 IEEE International Conference on Computer Vision (ICCV)
The long short-term memory (LSTM) neural network is capable of processing complex sequential information since it utilizes special gating schemes for learning representations from long input sequences. ...
To address this problem, we propose a differential gating scheme for the L-STM neural network, which emphasizes on the change in information gain caused by the salient motions between the successive frames ...
Recently, the huge success of deep networks in image classification [18] and speech recognition [11] has inspired many researchers to apply the deep neural networks, such as 3D Convolutional Neural ...
doi:10.1109/iccv.2015.460
dblp:conf/iccv/VeeriahZQ15
fatcat:eg35qye7lnai7g5s42czp6pxeq
A Hierarchical Deep Temporal Model for Group Activity Recognition
2016
2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
To make use of these observations, we present a 2-stage deep temporal model for the group activity recognition problem. ...
In group activity recognition, the temporal dynamics of the whole activity can be inferred based on the dynamics of the individual people representing the activity. ...
Acknowledgements This work was supported by grants from NSERC and Disney Research. ...
doi:10.1109/cvpr.2016.217
dblp:conf/cvpr/IbrahimMDVM16
fatcat:m6z36t23wjb5hcmv4ucm32uamq
Human Action Recognition from Various Data Modalities: A Review
[article]
2021
arXiv
pre-print
Specifically, we review the current mainstream deep learning methods for single data modalities and multiple data modalities, including the fusion-based and the co-learning-based frameworks. ...
In this paper, we present a comprehensive survey of recent progress in deep learning methods for HAR based on the type of input data modality. ...
spatio-temporal feature gating to enhance HAR. ...
arXiv:2012.11866v4
fatcat:twjnaur2jzahznci6clkadylay
A Hierarchical Deep Temporal Model for Group Activity Recognition
[article]
2016
arXiv
pre-print
To make use of these ob- servations, we present a 2-stage deep temporal model for the group activity recognition problem. ...
In group activity recognition, the temporal dynamics of the whole activity can be inferred based on the dynamics of the individual people representing the activity. ...
Acknowledgements This work was supported by grants from NSERC and Disney Research. ...
arXiv:1511.06040v2
fatcat:danywqdq7beandbkdaagv2fw3i
Weakly-Supervised Action Localization and Action Recognition using Global-Local Attention of 3D CNN
[article]
2021
arXiv
pre-print
3D Convolutional Neural Network (3D CNN) captures spatial and temporal information on 3D data such as video sequences. ...
ii) implement attention gating network to improve the accuracy of the action recognition. ...
ACKNOWLEDGMENT The authors would like to thank KAKENHI project no. 16K00239 for funding the research. ...
arXiv:2012.09542v2
fatcat:kph25ge5hzfl5gugazp5lvzdxm
Sign Language Recognition Analysis using Multimodal Data
[article]
2019
arXiv
pre-print
In this work, we investigate the feasibility of using skeletal and RGB video data for sign language recognition using a combination of different deep learning architectures. ...
With the advancement of depth sensors, skeletal data is used for applications like video analysis and activity recognition. ...
for digital assistants. 2 Index Terms-neural networks, deep learning, modality-fusion, sign language recognition
I. ...
arXiv:1909.11232v1
fatcat:owkrqtzc6ngrna5bdtsru26gqq
Deep Facial Expression Recognition: A Survey
[article]
2018
arXiv
pre-print
With the transition of facial expression recognition (FER) from laboratory-controlled to challenging in-the-wild conditions and the recent success of deep learning techniques in various fields, deep neural ...
For the state of the art in deep FER, we review existing novel deep neural networks and related training strategies that are designed for FER based on both static images and dynamic image sequences, and ...
techniques for multimodal affect recognition. ...
arXiv:1804.08348v2
fatcat:katpvrizybha5bgy6bepfi3xpe
A Multi-scale Approach to Gesture Detection and Recognition
2013
2013 IEEE International Conference on Computer Vision Workshops
Finally, we employ a Recurrent Neural Network for modeling large-scale temporal dependencies, data fusion and ultimately gesture classification. ...
Our experiments on the 2013 Challenge on Multimodal Gesture Recognition dataset have demonstrated that using multiple modalities at several spatial and temporal scales leads to a significant increase in ...
Finally, we use extracted blocks for supervised training of a convolutional network [17] consisting of 2 convolutional layers with tanh activations and 2 sub-sampling layers (ConvNet in Fig. 2 ). ...
doi:10.1109/iccvw.2013.69
dblp:conf/iccvw/Neverova0PSTN13
fatcat:lerju4ym75bmjjinrwp74iaedi
Heterogeneous Non-Local Fusion for Multimodal Activity Recognition
2020
Proceedings of the 2020 International Conference on Multimedia Retrieval
To further promote research into multimodal activity recognition, we introduce GloVid, a first-person activity dataset captured with video recordings and smart glove sensor readings. ...
Here, we propose an activity network that fuses multimodal inputs coming from completely different and heterogeneous sensors. We frame such a heterogeneous fusion as a non-local operation. ...
Within the last years, video recognition with deep spatio-temporal networks has become the dominant research direction. ...
doi:10.1145/3372278.3390675
dblp:conf/mir/ByvshevMX20
fatcat:5ot2q5pumrdydnnnwc6qqh27nu
Analysis of Deep Neural Networks For Human Activity Recognition in Videos – A Systematic Literature Review
2021
IEEE Access
The scope of this study is to learn the impact of deep neural architecture for Spatio-temporal feature extraction for improved activity classification. ...
Deep learning techniques such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNN) are highly effective in learning complex activities due to their characteristics of local dependency ...
She has previously served as a lecturer in the field of Computer Science and IT and has one journal publication before. Her research interests are Data Science, Machine learning, and Computer vision. ...
doi:10.1109/access.2021.3110610
fatcat:ussooxm7azfljpb5prsm7creaa
« Previous
Showing results 1 — 15 out of 361 results