A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2018; you can also visit the original URL.
The file type is application/pdf
.
Filters
Interaction-Aware Spatio-Temporal Pyramid Attention Networks for Action Classification
[chapter]
2018
Lecture Notes in Computer Science
Finally, our model is embedded in general CNNs to form end-to-end attention networks for action classification. ...
To address this, we propose an effective interaction-aware self-attention model inspired by PCA to learn attention maps. ...
We propose an interaction-aware spatio-temporal pyramid attention layer. It is embedded in general CNNs to generate more discriminative attention networks for video action classification. ...
doi:10.1007/978-3-030-01270-0_23
fatcat:dn4oixqdh5ctzboatvtokx5djm
Interaction-aware Spatio-temporal Pyramid Attention Networks for Action Classification
[article]
2018
arXiv
pre-print
Finally, our model is embedded in general CNNs to form end-to-end attention networks for action classification. ...
Moreover, our spatial pyramid attention is unrestricted to the number of its input feature maps so it is easily extended to a spatio-temporal version. ...
We propose an interaction-aware spatio-temporal pyramid attention layer. It is embedded in general CNNs to generate more discriminative attention networks for video action classification. ...
arXiv:1808.01106v1
fatcat:ita57gll2rbwxdsoppawkjx5yi
Analysis of Deep Neural Networks For Human Activity Recognition in Videos – A Systematic Literature Review
2021
IEEE Access
for action classification. ...
TABLE 9 : 9 Spatio-temporal human activity classification techniques analysis. ...
doi:10.1109/access.2021.3110610
fatcat:ussooxm7azfljpb5prsm7creaa
Crowd understanding and analysis
2021
IET Image Processing
These social activities are often attended by a wide range of people, which puts forward high requirements for effective management and ensures the safety of the people involved in the activities. ...
Based on the resultant action proposals, a two-stream network with a spatio-temporal structure is adopted for the action recognition task. ...
"MFP-Net: Multi-scale feature pyramid network for crowd counting" of Lei et al. introduces a feature pyramid fusion module and a feature attention-aware module. ...
doi:10.1049/ipr2.12379
fatcat:shshhjjoxngotplvg7xzefpsne
RGB-D-based Human Motion Recognition with Deep Learning: A Survey
[article]
2018
arXiv
pre-print
In particular, convolutional neural networks (CNN) have achieved great success for image-based tasks, and recurrent neural networks (RNN) are renowned for sequence-based problems. ...
Particularly, we highlighted the methods of encoding spatial-temporal-structural information inherent in video sequence, and discuss potential directions for future research. ...
[92] proposed a spatial-aware object embedding for zero-shot action localization and classification. ...
arXiv:1711.08362v2
fatcat:cugugpqeffcshnwwto4z2aw4ti
Action Recognition by an Attention-Aware Temporal Weighted Convolutional Neural Network
2018
Sensors
Motivated by the popular recurrent attention models in the research area of natural language processing, we propose the Attention-aware Temporal Weighted CNN (ATW CNN) for action recognition in videos, ...
Research in human action recognition has accelerated significantly since the introduction of powerful machine learning tools such as Convolutional Neural Networks (CNNs). ...
[29] introduced a spatio-temporal Laplacian pyramid coding method for action representation. ...
doi:10.3390/s18071979
pmid:29933555
pmcid:PMC6069475
fatcat:byyotu7o75amzpbtifmpkpyunm
Deep Learning-based Action Detection in Untrimmed Videos: A Survey
[article]
2021
arXiv
pre-print
In addition, this paper also reviews advances in spatio-temporal action detection where actions are localized in both temporal and spatial dimensions. ...
The task of temporal activity detection in untrimmed videos aims to localize the temporal boundary of actions and classify the action categories. ...
Temporal Feature Pyramid Network: In a temporal feature pyramid network (TFPN), the predictions are yielded from multiple resolution feature maps. ...
arXiv:2110.00111v1
fatcat:ven4rijqmnbyxflrf6wyxfpex4
Spatio–Temporal Image Representation of 3D Skeletal Movements for View-Invariant Action Recognition with Deep Convolutional Neural Networks
2019
Sensors
Two main challenges in this task include how to efficiently represent spatio–temporal patterns of skeletal movements and how to learn their discriminative features for classification tasks. ...
For learning and classification tasks, we exploit Deep Convolutional Neural Networks based on the DenseNet architecture to learn directly an end-to-end mapping between input skeleton sequences and their ...
An encoding technique called Temporal Pyramid Matching (TPM) [35] was then used for keeping the temporal information and performing action classification. ...
doi:10.3390/s19081932
fatcat:sbswj2uakbhefj6yjpludvlgae
Dual-Level Decoupled Transformer for Video Captioning
[article]
2022
arXiv
pre-print
For the former, "couple" means learning spatio-temporal representation in a single model(3DCNN), resulting the problems named disconnection in task/pre-train domain and hard for end-to-end training. ...
To this end, we present 𝒟^2 - a dual-level decoupled transformer pipeline to solve the above drawbacks: (i) for video spatio-temporal representation, we decouple the process of it into "first-spatial-then-temporal ...
Li [22] adopts two layers of spatio-temporal dynamic attention for video subtitles. ...
arXiv:2205.03039v1
fatcat:omrzfavtlngotbf27d43nwe4k4
Skeleton based Activity Recognition by Fusing Part-wise Spatio-temporal and Attention Driven Residues
[article]
2019
arXiv
pre-print
To extract and learn salient features for action recognition, attention driven residues are used which enhance the performance of residual components for effective 3D skeleton-based Spatio-temporal action ...
part for action recognition by applying weighted late fusion mechanism. ...
Human skeleton-based action recognition approaches generally exploit temporal dynamics of the action [6] , by developing explicit temporal dynamics model such as Fourier Temporal Pyramids (FTPs) [7] ...
arXiv:1912.00576v1
fatcat:4pg77axdxbd43p6sa6lt6fmnoe
End-to-end Video-level Representation Learning for Action Recognition
[article]
2018
arXiv
pre-print
In this paper, we build upon two-stream ConvNets and propose Deep networks with Temporal Pyramid Pooling (DTPP), an end-to-end video-level representation learning approach, to address these problems. ...
Then a temporal pyramid pooling layer is used to aggregate the frame-level features which consist of spatial and temporal cues. ...
Introduction In recent years, human action recognition has received increasing attention due to potential applications in humanrobot interaction, behaviour analysis and surveillance. ...
arXiv:1711.04161v7
fatcat:qqa3rrwdejgunkepa5k56eepbi
Remarkable Skeleton Based Human Action Recognition
2020
Artificial Intelligence Evolution
In this paper, we first highlight the need for action recognition and significance of 3D skeleton data and finally, we survey the largest 3D skeleton dataset, i.e. ...
The deep learning method has been used in this field for a long time, but so far, no research has fully demonstrated its usefulness. ...
Spatio-Temporal Attention (STA)-LSTM. The authors [67] investigated the endwise spatial-temporal model for actionrecognition. ...
doi:10.37256/aie.122020562
fatcat:2wdzis5ax5bdfnwdhlzgvoh6xu
2021 Index IEEE Transactions on Multimedia Vol. 23
2021
IEEE transactions on multimedia
The Author Index contains the primary entry for each item, listed under the first author's name. ...
., +, TMM 2021 2471-2480 Temporal Locality-Aware Network With Dual Structure for Accurate and Fast Action Detection. ...
., +, TMM 2021 1083-1094 AFNet: Temporal Locality-Aware Network With Dual Structure for Accurate and Fast Action Detection. ...
doi:10.1109/tmm.2022.3141947
fatcat:lil2nf3vd5ehbfgtslulu7y3lq
RGB-D Data-Based Action Recognition: A Review
2021
Sensors
Classification of human actions is an ongoing research problem in computer vision. ...
Naturally, each action-data modality—such as RGB, depth, skeleton, and infrared (IR)—has distinct characteristics; therefore, it is important to exploit the value of each modality for better action recognition ...
[96] have proposed Channel-Separated Convolutional Networks (CSNN) which demonstrate the benefits of factorizing 3D convolutions by separating spatio-temporal interactions and channel interactions. ...
doi:10.3390/s21124246
fatcat:7dvocdy63rckne5yunhfsnr4p4
Action Recognition with Deep Multiple Aggregation Networks
[article]
2020
arXiv
pre-print
The latter are clearly powerless to fully exhibit the actual temporal granularity of action categories and thereby constitute a bottleneck in classification performances. ...
in action categories, has comparatively received less attention, and existing solutions rely mainly on max or averaging operations. ...
CONCLUSION We introduce in this paper a temporal pyramid approach for video action recognition. ...
arXiv:2006.04489v1
fatcat:ybavmgy33ffflnefb4sc6z4cpu
« Previous
Showing results 1 — 15 out of 854 results