22,774 Hits in 3.0 sec

Long-Short Temporal Modeling for Efficient Action Recognition [article]

Liyu Wu, Yuexian Zou, Can Zhang
2021 arXiv   pre-print
Efficient long-short temporal modeling is key for enhancing the performance of action recognition task.  ...  In this paper, we propose a new two-stream action recognition network, termed as MENet, consisting of a Motion Enhancement (ME) module and a Video-level Aggregation (VLA) module to achieve long-short temporal  ...  Some of the essential features for action recognition are the appearance and motion features. Motion features have been proved significantly effective in action recognition.  ... 
arXiv:2106.15787v1 fatcat:zbzjjmfesvauzkdagtleretilu

Hidden Two-Stream Convolutional Networks for Action Recognition [article]

Yi Zhu, Zhenzhong Lan, Shawn Newsam, Alexander G. Hauptmann
2018 arXiv   pre-print
State-of-the-art action recognition approaches rely on traditional optical flow estimation methods to pre-compute motion information for CNNs.  ...  Experimental results on four challenging action recognition datasets: UCF101, HMDB51, THUMOS14 and ActivityNet v1.2 show that our approach significantly outperforms the previous best real-time approaches  ...  We call our new approach hidden two-stream networks as it implicitly generates motion information for action recognition.  ... 
arXiv:1704.00389v4 fatcat:4db4gal43jfhxemhusjxl3tpg4

Literature Review of Action Recognition in the Wild [article]

Asket Kaur, Navya Rao, Tanya Joon
2019 arXiv   pre-print
The literature review presented below on Action Recognition in the wild is the in-depth study of Research Papers.  ...  Action Recognition problem in the untrimmed videos is a challenging task and most of the papers have tackled this problem using hand-crafted features with shallow learning techniques and sophisticated  ...  for action recognition.  ... 
arXiv:1911.12249v1 fatcat:46qu4wtyqvhuxcomoymdd5owcm

Multi-Modal Three-Stream Network for Action Recognition [article]

Muhammad Usman Khalid, Jie Yu
2019 arXiv   pre-print
Inspired by the successful two stream networks for action classification, additional pose features are studied and fused to enhance understanding of human action in a more abstract and semantic way.  ...  In this paper, a novel video based action recognition framework utilizing complementary cues is proposed to handle this complex problem.  ...  Finally, complementary cues for action recognition, i.e. appearance, optical flow and posture features are analyzed and fused to handle varied action classes.  ... 
arXiv:1909.03466v1 fatcat:neittxn2ozcvfkypepbc6rljha

2D or not 2D? Adaptive 3D Convolution Selection for Efficient Video Recognition [article]

Hengduo Li, Zuxuan Wu, Abhinav Shrivastava, Larry S. Davis
2021 arXiv   pre-print
3D convolutional networks are prevalent for video recognition.  ...  Our qualitative analysis indicates that our method allocates fewer 3D convolutions and frames for "static" inputs, yet uses more for motion-intensive clips.  ...  Efficient video recognition. Extensive studies have been conducted on designing efficient network architectures for video recognition [50, 5, 35, 8, 49, 24, 34] .  ... 
arXiv:2012.14950v2 fatcat:glt4jkxn5zcdhl2kls35ax6bia

A Comprehensive Study of Deep Video Action Recognition [article]

Yi Zhu, Xinyu Li, Chunhui Liu, Mohammadreza Zolfaghari, Yuanjun Xiong, Chongruo Wu, Zhi Zhang, Joseph Tighe, R. Manmatha, Mu Li
2020 arXiv   pre-print
Video action recognition is one of the representative tasks for video understanding.  ...  In this paper, we provide a comprehensive survey of over 200 existing papers on deep learning for video action recognition.  ...  Acknowledgement We would like to thank Peter Gehler, Linchao Zhu and Thomas Brady for constructive feedback and fruitful discussions.  ... 
arXiv:2012.06567v1 fatcat:plqytbfck5bcndiceshix5unpa

HMS: Hierarchical Modality Selection for Efficient Video Recognition [article]

Zejia Weng, Zuxuan Wu, Hengduo Li, Yu-Gang Jiang
2021 arXiv   pre-print
This paper introduces Hierarchical Modality Selection (HMS), a simple yet efficient multimodal learning framework for efficient video recognition.  ...  Conventional video recognition pipelines typically fuse multimodal features for improved performance.  ...  Efficient Networks for Video Recognition. There is a growing interest in designing efficient network architectures for both 2D and 3D networks in video recognition tasks.  ... 
arXiv:2104.09760v2 fatcat:js2whnimvvbhfp5uzenqu3mlvq

Actionness Estimation Using Hybrid Fully Convolutional Networks [article]

Limin Wang, Yu Qiao, Xiaoou Tang, Luc Van Gool
2016 arXiv   pre-print
This paper presents a new deep architecture for actionness estimation, called hybrid fully convolutional network (H-FCN), which is composed of appearance FCN (A-FCN) and motion FCN (M-FCN).  ...  Accurate and efficient estimation of actionness is important in video analysis and may benefit other relevant tasks such as action recognition and action detection.  ...  Following the two-stream convolutional networks [30] for action recognition, we propose a hybrid fully convolutional networks (H-FCN) for the task of actionness estimation, as illustrated in Figure  ... 
arXiv:1604.07279v1 fatcat:3xrfaqnj75dv7hgkklecryxn6u

Procedural Generation of Videos to Train Deep Action Recognition Networks

Cesar Roberto de Souza, Adrien Gaidon, Yohann Cabon, Antonio Manuel Lopez
2017 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)  
In this work, we investigate the generation of synthetic training data for action recognition, as it has recently shown promising results for a variety of other computer vision tasks.  ...  Our approach is not limited to existing motion capture sequences, and we procedurally define 14 synthetic actions.  ...  learning deep action recognition networks.  ... 
doi:10.1109/cvpr.2017.278 dblp:conf/cvpr/SouzaGCP17 fatcat:w3frbltuh5ecbn67gsrhw33vji

Residual Frames with Efficient Pseudo-3D CNN for Human Action Recognition [article]

Jiawei Chen, Jenson Hsiao, Chiu Man Ho
2020 arXiv   pre-print
Despite recent progress in the development of end-to-end solutions for video-based action recognition, achieving state-of-the-art performance still requires using auxiliary hand-crafted motion representations  ...  Human action recognition is regarded as a key cornerstone in domains such as surveillance or video understanding.  ...  It indicates residual frames indeed contain salient motion information which is important for action recognition.  ... 
arXiv:2008.01057v1 fatcat:pedtbooolrakpgylhxfyzdynga

Multi-stream CNN: Learning representations based on human-related regions for action recognition

Zhigang Tu, Wei Xie, Qianqing Qin, Ronald Poppe, Remco C. Veltkamp, Baoxin Li, Junsong Yuan
2018 Pattern Recognition  
The most successful video-based human action recognition methods rely on feature representations extracted using Convolutional Neural Networks (CNNs).  ...  Inspired by the two-stream network (TS-Net), we propose a multi-stream Convolutional Neural Network (CNN) architecture to recognize human actions.  ...  When efficiency is not an issue, a flow algorithm that is able to preserve small motion details while also handling large displacements is most suitable for action recognition.  ... 
doi:10.1016/j.patcog.2018.01.020 fatcat:ojy34is2vnfxpexoqi2utfh5l4

Directional Temporal Modeling for Action Recognition [article]

Xinyu Li, Bing Shuai, Joseph Tighe
2020 arXiv   pre-print
Our CIDC network can be attached to any activity recognition backbone network.  ...  We further visualize the activation map of our CIDC network and show that it is able to focus on more meaningful, action related parts of the frame.  ...  These results demonstrate that the proposed network is effective at learning clip-level motion information for action recognition.  ... 
arXiv:2007.11040v1 fatcat:lkfdormidjfz7bqb3basuddmti

Refined Spatial Network for Human Action Recognition

Chunlei Wu, Haiwen Cao, Weishan Zhang, Leiquan Wang, Yiwei Wei, Zexin Peng
2019 IEEE Access  
INDEX TERMS Action recognition, encoder-decoder, spatial features. VOLUME 7, 2019 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see  ...  However, the slenderer spatial information for action representation has lost.  ...  effectiveness on action recognition. The combination of spatial and motion information, which is called two-stream, is a mainstream on action recognition.  ... 
doi:10.1109/access.2019.2933303 fatcat:32j5a67frfey7gl4k2oaxicjvy

IF-TTN: Information Fused Temporal Transformation Network for Video Action Recognition [article]

Ke Yang, Peng Qiao, Dongsheng Li, Yong Dou
2019 arXiv   pre-print
Focusing on discriminate spatiotemporal feature learning, we propose Information Fused Temporal Transformation Network (IF-TTN) for action recognition on top of popular Temporal Segment Network (TSN) framework  ...  In the network, Information Fusion Module (IFM) is designed to fuse the appearance and motion features at multiple ConvNet levels for each video snippet, forming a short-term video descriptor.  ...  EMV-CNN [32] first used motion vectors as motion rep-resentation for real-time action recognition.  ... 
arXiv:1902.09928v2 fatcat:anza23kv2zhwnotqdu5avm4e4e

TDN: Temporal Difference Networks for Efficient Action Recognition [article]

Limin Wang, Zhan Tong, Bin Ji, Gangshan Wu
2021 arXiv   pre-print
To mitigate this issue, this paper presents a new video architecture, termed as Temporal Difference Network (TDN), with a focus on capturing multi-scale temporal information for efficient action recognition  ...  Temporal modeling still remains challenging for action recognition in videos.  ...  Introduction Deep neural networks have witnessed great progress for action recognition in videos [14, 29, 38, 31, 6, 26, 37] .  ... 
arXiv:2012.10071v2 fatcat:bylf4voblfab3e2oxhaiz3bvoa
« Previous Showing results 1 — 15 out of 22,774 results