854 Hits in 6.3 sec

Interaction-Aware Spatio-Temporal Pyramid Attention Networks for Action Classification [chapter]

Yang Du, Chunfeng Yuan, Bing Li, Lili Zhao, Yangxi Li, Weiming Hu
2018 Lecture Notes in Computer Science  
Finally, our model is embedded in general CNNs to form end-to-end attention networks for action classification.  ...  To address this, we propose an effective interaction-aware self-attention model inspired by PCA to learn attention maps.  ...  We propose an interaction-aware spatio-temporal pyramid attention layer. It is embedded in general CNNs to generate more discriminative attention networks for video action classification.  ... 
doi:10.1007/978-3-030-01270-0_23 fatcat:dn4oixqdh5ctzboatvtokx5djm

Interaction-aware Spatio-temporal Pyramid Attention Networks for Action Classification [article]

Yang Du and Chunfeng Yuan and Bing Li and Lili Zhao and Yangxi Li and Weiming Hu
2018 arXiv   pre-print
Finally, our model is embedded in general CNNs to form end-to-end attention networks for action classification.  ...  Moreover, our spatial pyramid attention is unrestricted to the number of its input feature maps so it is easily extended to a spatio-temporal version.  ...  We propose an interaction-aware spatio-temporal pyramid attention layer. It is embedded in general CNNs to generate more discriminative attention networks for video action classification.  ... 
arXiv:1808.01106v1 fatcat:ita57gll2rbwxdsoppawkjx5yi

Analysis of Deep Neural Networks For Human Activity Recognition in Videos – A Systematic Literature Review

Hadiqa Aman Ullah, Sukumar Letchmunan, M. Sultan Zia, Umair Muneer Butt, Fadratul Hafinaz Hassan
2021 IEEE Access  
for action classification.  ...  TABLE 9 : 9 Spatio-temporal human activity classification techniques analysis.  ... 
doi:10.1109/access.2021.3110610 fatcat:ussooxm7azfljpb5prsm7creaa

Crowd understanding and analysis

Qi Wang, Bo Liu, Jianzhe Lin
2021 IET Image Processing  
These social activities are often attended by a wide range of people, which puts forward high requirements for effective management and ensures the safety of the people involved in the activities.  ...  Based on the resultant action proposals, a two-stream network with a spatio-temporal structure is adopted for the action recognition task.  ...  "MFP-Net: Multi-scale feature pyramid network for crowd counting" of Lei et al. introduces a feature pyramid fusion module and a feature attention-aware module.  ... 
doi:10.1049/ipr2.12379 fatcat:shshhjjoxngotplvg7xzefpsne

RGB-D-based Human Motion Recognition with Deep Learning: A Survey [article]

Pichao Wang and Wanqing Li and Philip Ogunbona and Jun Wan and Sergio Escalera
2018 arXiv   pre-print
In particular, convolutional neural networks (CNN) have achieved great success for image-based tasks, and recurrent neural networks (RNN) are renowned for sequence-based problems.  ...  Particularly, we highlighted the methods of encoding spatial-temporal-structural information inherent in video sequence, and discuss potential directions for future research.  ...  [92] proposed a spatial-aware object embedding for zero-shot action localization and classification.  ... 
arXiv:1711.08362v2 fatcat:cugugpqeffcshnwwto4z2aw4ti

Action Recognition by an Attention-Aware Temporal Weighted Convolutional Neural Network

Le Wang, Jinliang Zang, Qilin Zhang, Zhenxing Niu, Gang Hua, Nanning Zheng
2018 Sensors  
Motivated by the popular recurrent attention models in the research area of natural language processing, we propose the Attention-aware Temporal Weighted CNN (ATW CNN) for action recognition in videos,  ...  Research in human action recognition has accelerated significantly since the introduction of powerful machine learning tools such as Convolutional Neural Networks (CNNs).  ...  [29] introduced a spatio-temporal Laplacian pyramid coding method for action representation.  ... 
doi:10.3390/s18071979 pmid:29933555 pmcid:PMC6069475 fatcat:byyotu7o75amzpbtifmpkpyunm

Deep Learning-based Action Detection in Untrimmed Videos: A Survey [article]

Elahe Vahdani, Yingli Tian
2021 arXiv   pre-print
In addition, this paper also reviews advances in spatio-temporal action detection where actions are localized in both temporal and spatial dimensions.  ...  The task of temporal activity detection in untrimmed videos aims to localize the temporal boundary of actions and classify the action categories.  ...  Temporal Feature Pyramid Network: In a temporal feature pyramid network (TFPN), the predictions are yielded from multiple resolution feature maps.  ... 
arXiv:2110.00111v1 fatcat:ven4rijqmnbyxflrf6wyxfpex4

Spatio–Temporal Image Representation of 3D Skeletal Movements for View-Invariant Action Recognition with Deep Convolutional Neural Networks

Huy Hieu Pham, Houssam Salmane, Louahdi Khoudour, Alain Crouzil, Pablo Zegers, Sergio A. Velastin
2019 Sensors  
Two main challenges in this task include how to efficiently represent spatiotemporal patterns of skeletal movements and how to learn their discriminative features for classification tasks.  ...  For learning and classification tasks, we exploit Deep Convolutional Neural Networks based on the DenseNet architecture to learn directly an end-to-end mapping between input skeleton sequences and their  ...  An encoding technique called Temporal Pyramid Matching (TPM) [35] was then used for keeping the temporal information and performing action classification.  ... 
doi:10.3390/s19081932 fatcat:sbswj2uakbhefj6yjpludvlgae

Dual-Level Decoupled Transformer for Video Captioning [article]

Yiqi Gao, Xinglin Hou, Wei Suo, Mengyang Sun, Tiezheng Ge, Yuning Jiang, Peng Wang
2022 arXiv   pre-print
For the former, "couple" means learning spatio-temporal representation in a single model(3DCNN), resulting the problems named disconnection in task/pre-train domain and hard for end-to-end training.  ...  To this end, we present 𝒟^2 - a dual-level decoupled transformer pipeline to solve the above drawbacks: (i) for video spatio-temporal representation, we decouple the process of it into "first-spatial-then-temporal  ...  Li [22] adopts two layers of spatio-temporal dynamic attention for video subtitles.  ... 
arXiv:2205.03039v1 fatcat:omrzfavtlngotbf27d43nwe4k4

Skeleton based Activity Recognition by Fusing Part-wise Spatio-temporal and Attention Driven Residues [article]

Chhavi Dhiman, Dinesh Kumar Vishwakarma, Paras Aggarwal
2019 arXiv   pre-print
To extract and learn salient features for action recognition, attention driven residues are used which enhance the performance of residual components for effective 3D skeleton-based Spatio-temporal action  ...  part for action recognition by applying weighted late fusion mechanism.  ...  Human skeleton-based action recognition approaches generally exploit temporal dynamics of the action [6] , by developing explicit temporal dynamics model such as Fourier Temporal Pyramids (FTPs) [7]  ... 
arXiv:1912.00576v1 fatcat:4pg77axdxbd43p6sa6lt6fmnoe

End-to-end Video-level Representation Learning for Action Recognition [article]

Jiagang Zhu, Wei Zou, Zheng Zhu
2018 arXiv   pre-print
In this paper, we build upon two-stream ConvNets and propose Deep networks with Temporal Pyramid Pooling (DTPP), an end-to-end video-level representation learning approach, to address these problems.  ...  Then a temporal pyramid pooling layer is used to aggregate the frame-level features which consist of spatial and temporal cues.  ...  Introduction In recent years, human action recognition has received increasing attention due to potential applications in humanrobot interaction, behaviour analysis and surveillance.  ... 
arXiv:1711.04161v7 fatcat:qqa3rrwdejgunkepa5k56eepbi

Remarkable Skeleton Based Human Action Recognition

Sushma Jaiswal, Tarun Jaiswal
2020 Artificial Intelligence Evolution  
In this paper, we first highlight the need for action recognition and significance of 3D skeleton data and finally, we survey the largest 3D skeleton dataset, i.e.  ...  The deep learning method has been used in this field for a long time, but so far, no research has fully demonstrated its usefulness.  ...  Spatio-Temporal Attention (STA)-LSTM. The authors [67] investigated the endwise spatial-temporal model for actionrecognition.  ... 
doi:10.37256/aie.122020562 fatcat:2wdzis5ax5bdfnwdhlzgvoh6xu

2021 Index IEEE Transactions on Multimedia Vol. 23

2021 IEEE transactions on multimedia  
The Author Index contains the primary entry for each item, listed under the first author's name.  ...  ., +, TMM 2021 2471-2480 Temporal Locality-Aware Network With Dual Structure for Accurate and Fast Action Detection.  ...  ., +, TMM 2021 1083-1094 AFNet: Temporal Locality-Aware Network With Dual Structure for Accurate and Fast Action Detection.  ... 
doi:10.1109/tmm.2022.3141947 fatcat:lil2nf3vd5ehbfgtslulu7y3lq

RGB-D Data-Based Action Recognition: A Review

Muhammad Bilal Shaikh, Douglas Chai
2021 Sensors  
Classification of human actions is an ongoing research problem in computer vision.  ...  Naturally, each action-data modality—such as RGB, depth, skeleton, and infrared (IR)—has distinct characteristics; therefore, it is important to exploit the value of each modality for better action recognition  ...  [96] have proposed Channel-Separated Convolutional Networks (CSNN) which demonstrate the benefits of factorizing 3D convolutions by separating spatio-temporal interactions and channel interactions.  ... 
doi:10.3390/s21124246 fatcat:7dvocdy63rckne5yunhfsnr4p4

Action Recognition with Deep Multiple Aggregation Networks [article]

Ahmed Mazari, Hichem Sahbi
2020 arXiv   pre-print
The latter are clearly powerless to fully exhibit the actual temporal granularity of action categories and thereby constitute a bottleneck in classification performances.  ...  in action categories, has comparatively received less attention, and existing solutions rely mainly on max or averaging operations.  ...  CONCLUSION We introduce in this paper a temporal pyramid approach for video action recognition.  ... 
arXiv:2006.04489v1 fatcat:ybavmgy33ffflnefb4sc6z4cpu
« Previous Showing results 1 — 15 out of 854 results