Filters








17,563 Hits in 9.4 sec

Selective Hypergraph Convolutional Networks for Skeleton-based Action Recognition

Yiran Zhu, Guangji Huang, Xing Xu, Yanli Ji, Fumin Shen
2022 Proceedings of the 2022 International Conference on Multimedia Retrieval  
In skeleton-based action recognition, Graph Convolutional Networks (GCNs) have achieved remarkable performance since the skeleton representation of human action can be naturally modeled by the graph structure  ...  The SHC module represents the human skeleton as the graph and hypergraph to fully extract multi-scale information, and selectively fuse features at various scales.  ...  ICMR '22, June 27-30, 2022, Newark, NJ, USA. connected classifier with global average pooling for action classification.  ... 
doi:10.1145/3512527.3531367 fatcat:ardt3crf5ba2nmjvdbijd5tcte

Multi-scale Mixed Dense Graph Convolution Network for Skeleton-based Action Recognition

Hailun Xia, Xinkai Gao
2021 IEEE Access  
INDEX TERMS Dense graph convolution, spatial and temporal attention module, multi-scale mixed temporal convolution, skeleton-based action recognition.  ...  In skeleton-based action recognition, the approaches based on graph convolutional networks(GCN) have achieved remarkable performance by modeling spatial-temporal graphs to explore the physical dependencies  ...  the combination of different scale convolution kernels. • On three large-scale datasets for skeleton-based action recognition, our model achieves excellent performance.  ... 
doi:10.1109/access.2020.3049029 fatcat:xlmmcsmp3vbnvj422wctwwjiei

Deep Texture Manifold for Ground Terrain Recognition

Jia Xue, Hang Zhang, Kristin Dana
2018 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition  
We present a texture network called Deep Encoding Pooling Network (DEP) for the task of ground terrain recognition.  ...  The resultant network shows excellent performance not only for GTOS-mobile, but also for more general databases (MINC and DTD).  ...  A TITAN X used for this research was donated by the NVIDIA Corporation.  ... 
doi:10.1109/cvpr.2018.00065 dblp:conf/cvpr/Xue0D18 fatcat:zlgprc233ffvfjs4zgj2tngfqq

Deep Texture Manifold for Ground Terrain Recognition [article]

Jia Xue, Hang Zhang, Kristin Dana
2018 arXiv   pre-print
We present a texture network called Deep Encoding Pooling Network (DEP) for the task of ground terrain recognition.  ...  The resultant network shows excellent performance not only for GTOS-mobile, but also for more general databases (MINC and DTD).  ...  A TITAN X used for this research was donated by the NVIDIA Corporation.  ... 
arXiv:1803.10896v2 fatcat:6khi35bq5zhe7pxmgscigrh7n4

LEARNING SPATIO-TEMPORAL FEATURE EXTRACTION USING RESIDUAL FRAMES WITH NEURALNETWORKS FOR HUMAN ACTION RECOGNITION

C.INDHUMATHI, Dr.V.MURUGAN, Dr.G.MUTHULAKSHMI
2022 Zenodo  
HAR (Human Action Recognition) highly demands efficient computation. This research proposed a method for selecting residual frames and keyframes toeliminate redundant information from videos.  ...  This method combines the extraction of spatial and temporal features. These features were extracted using the VGG16 (Visual Geometry Group) network and classified using Multi SVM classifier.  ...  To improve the belief network, multi-scale input data, spatiotemporal Deep Belief Network (DBN), and different pooling strategiesare analysed [2] .  ... 
doi:10.5281/zenodo.6655088 fatcat:qsto4ftp5rdofp7lfzqzucigoe

Skeleton-Based Action Recognition using Multi-Scale and Multi-Stream Improved Graph Convolutional Network

Wang Li, Xu Liu, Zheng Liu, Feixiang Du, Qiang Zou
2020 IEEE Access  
Also, multi-scale information always has necessary implications in computer vision algorithms [16] - [18] , but it is difficult for these existing models to fuse multiscale information for action recognition  ...  We first introduce the multi-scale mechanism to the skeleton-based action recognition tasks.  ... 
doi:10.1109/access.2020.3014445 fatcat:55gqhhvbsngevmxqbtoq7l7tmi

Visual Concept Reasoning Networks [article]

Taesup Kim, Sungwoong Kim, Yoshua Bengio
2020 arXiv   pre-print
Extensive experiments on visual recognition tasks such as image classification, semantic segmentation, object detection, scene recognition, and action recognition show that our proposed model, VCRNet,  ...  A split-transform-merge strategy has been broadly used as an architectural constraint in convolutional neural networks for visual recognition tasks.  ...  Scene Recognition and Action Recognition Places365 [31] is a dataset labeled with scene semantic categories for the scene recognition task.  ... 
arXiv:2008.11783v1 fatcat:ibqnbfkelbabngogis2jc543ie

Hierarchy Spatial-Temporal Transformer for Action Recognition in Short Videos [chapter]

Guoyong Cai, Yumeng Cai
2020 Frontiers in Artificial Intelligence and Applications  
Short videos action recognition based on deep learning has made a series of important progress; most of the proposed methods are based on 3D Convolution neural networks (3D CNN) and Two Stream architecture  ...  This work aims to build a network to learn better features and reduce the scale of parameters.  ...  average 3 x 3 global 3 x 3 global average 3 x 3 global average 7 x 7 pool, 1024 FC, average pool, pool, 1024 FC, pool, 1024 FC, softmax 1024 FC, softmax softmax softmax Table 2 . 2 List of the baseline  ... 
doi:10.3233/faia200754 fatcat:sdlqx4v2wzekve6waovvnqsc24

Multi-Scale Adaptive Aggregate Graph Convolutional Network for Skeleton-Based Action Recognition

Zhiyun Zheng, Yizhou Wang, Xingjin Zhang, Junfeng Wang
2022 Applied Sciences  
In this paper, we propose a multi-scale adaptive aggregate graph convolution network (MSAAGCN) for skeleton-based action recognition.  ...  First, we designed a multi-scale spatial GCN to aggregate the remote and multi-order semantic information of the skeleton data and comprehensively model the internal relations of the human body for feature  ...  Conclusions In this work, we proposed a multi-stream model consisting of multi-scale aggregate GCN, multi-scale adaptive TCN and STCAtt modules for skeleton-based action recognition.  ... 
doi:10.3390/app12031402 fatcat:3ceyzh5tjbb3jbkvx4a5nbeytm

Representation Learning for Compressed Video Action Recognition via Attentive Cross-modal Interaction with Motion Enhancement [article]

Bing Li, Jiaxin Chen, Dongming Zhang, Xiuguo Bao, Di Huang
2022 arXiv   pre-print
Particularly, the motion stream employs a multi-scale block embedded with a denoising module to enhance representation learning.  ...  attentive local motion features and CMA further combines the two modalities with selective feature augmentation.  ...  Conclusion In this paper, we propose an Attentive Cross-modal Interaction Network with Motion Enhancement (MEACI-Net) for compressed video action recognition.  ... 
arXiv:2205.03569v3 fatcat:2lqksq3b7baidmbjvwnhhvu5wu

Semantic Image Networks for Human Action Recognition [article]

Sunder Ali Khowaja, Seok-Lyong Lee
2019 arXiv   pre-print
In this paper, we propose the use of a semantic image, an improved representation for video analysis, principally in combination with Inception networks.  ...  , (iii) The use of LSTM leverages the temporal variance information from approximate rank pooling to model the action behavior better than the base network, (iv) the proposed representations can be adaptive  ...  Inception-ResNetv2 uses cheaper inception blocks for its combination with residual networks.  ... 
arXiv:1901.06792v1 fatcat:lloazaywnnhphe3poe7f6n6dci

Dynamic Gesture Recognition Using Surface EMG Signals Based on Multi-Stream Residual Network

Zhiwen Yang, Du Jiang, Ying Sun, Bo Tao, Xiliang Tong, Guozhang Jiang, Manman Xu, Juntong Yun, Ying Liu, Baojia Chen, Jianyi Kong
2021 Frontiers in Bioengineering and Biotechnology  
Therefore, a multi-stream residual network (MResLSTM) is proposed for dynamic hand movement recognition. This study aims to improve the accuracy and stability of dynamic gesture recognition.  ...  We combine the residual model and the convolutional short-term memory model into a unified framework.  ...  The residual model and variant ConvLSTM model combined into a multi-stream network. For a multi-stream network, each stream independently learns representative features by ResNet.  ... 
doi:10.3389/fbioe.2021.779353 pmid:34746114 pmcid:PMC8569623 fatcat:rxnninyw45aohba56fzlp6chre

Weakly-Supervised Action Localization and Action Recognition using Global-Local Attention of 3D CNN [article]

Novanto Yudistira, Muthu Subash Kavitha, Takio Kurita
2021 arXiv   pre-print
ii) implement attention gating network to improve the accuracy of the action recognition.  ...  Furthermore, the action recognition via attention gating on each layer produces better classification results than the baseline model.  ...  ACKNOWLEDGMENT The authors would like to thank KAKENHI project no. 16K00239 for funding the research.  ... 
arXiv:2012.09542v2 fatcat:kph25ge5hzfl5gugazp5lvzdxm

Dynamic Gesture Recognition Based on Feature Fusion Network and Variant ConvLSTM

Yuqing Peng, Huifang Tao, Wei Li, Hongtao Yuan, Tiejun Li
2020 IET Image Processing  
combines feature fusion network with variant convolutional long short-term memory (ConvLSTM).  ...  Finally, a multi-feature fusion depthwise separable network is used to learn higher-level features including depth feature information.  ...  The working principle of VConvLSTM can be expressed by X t = Global Average Pooling(X t ) H t − 1 = Global Average Pooling(H t − 1 ) i t = σ(W xi X t + W hi H t − 1 + b i ) f t = σ(W x f X t + W h f H  ... 
doi:10.1049/iet-ipr.2019.1248 fatcat:jkoybfiaovbb3cnktsh7mbx4jm

Temporal Modeling Approaches for Large-scale Youtube-8M Video Understanding [article]

Fu Li, Chuang Gan, Xiao Liu, Yunlong Bian, Xiang Long, Yandong Li, Zhichao Li, Jie Zhou, Shilei Wen
2017 arXiv   pre-print
Our system contains three major components: two-stream sequence model, fast-forward sequence model and temporal residual neural networks.  ...  Because the challenge provides pre-extracted visual and audio features instead of the raw videos, we mainly investigate various temporal modeling approaches to aggregate the frame-level features for multi-label  ...  In contrast with [14] that performs convolutions on frame-level features to learn global video-level representations, we combine convolution and recurrent neural networks to take the advantages of both  ... 
arXiv:1707.04555v1 fatcat:nmapxga24fd7vhi2gefsc7tsy4
« Previous Showing results 1 — 15 out of 17,563 results