5,332 Hits in 4.8 sec

LiteEval: A Coarse-to-Fine Framework for Resource Efficient Video Recognition [article]

Zuxuan Wu, Caiming Xiong, Yu-Gang Jiang, Larry S. Davis
2019 arXiv   pre-print
This paper presents LiteEval, a simple yet effective coarse-to-fine framework for resource efficient video recognition, suitable for both online and offline scenarios.  ...  This is achieved by a coarse LSTM and a fine LSTM operating cooperatively, as well as a conditional gating module to learn when to allocate more computation.  ...  We introduce LITEEVAL, a resource-efficient framework suitable for both online and offline video classification, which adaptively assigns computational resources to incoming video frames.  ... 
arXiv:1912.01601v1 fatcat:lgazytnsorgannmagbehx6apde

Moving object detection with Deep CNNs

Haidi Zhu, Xin Yan, Hongying Tang, Yuchao Chang, E. Baoqing Li, F. Xiaobing Yuan
2020 IEEE Access  
This paper proposes a novel framework that consists of a coarse-grained detection as well as a fine-grained detection.  ...  To the best of our knowledge, this is the first work that proposes a coarse-tofine grained framework to detect moving objects on high-resolution scenes.  ...  We have presented a coarse-to-fine grained framework and evaluated its effectiveness with extensive experiments.  ... 
doi:10.1109/access.2020.2972562 fatcat:r6a3t2gfvvarlkhezbn64aros4

Attention-driven action retrieval with DTW-based 3d descriptor matching

Rongrong Ji, Xiaoshui Sun, Hongxun Yao, Pengfei Xu, Tianqiang Liu, Xianming Liu
2008 Proceeding of the 16th ACM international conference on Multimedia - MM '08  
This paper presents a content-based action retrieval framework to enable effective search of near-duplicated actions in large-scale video database.  ...  From visual perception viewpoint, actions in videos can capture high-level semantics for video content understanding and retrieval.  ...  This coarse-to-fine solution ensures our method to be still efficient in large-scale database without losing too much accuracy.  ... 
doi:10.1145/1459359.1459443 dblp:conf/mm/JiSYXLL08 fatcat:afopnoufqnfwho6ijzkqzyy3yy

Binary Hashing CNN Features for Action Recognition

2018 KSII Transactions on Internet and Information Systems  
The purpose of this work is to solve the problem of representing an entire video using Convolutional Neural Network (CNN) features for human action recognition.  ...  A typical method is to use sampled video frames as inputs and corresponding labels as supervision.  ...  The authors would like to thank the anonymous referees for their valuable comments and suggestions.  ... 
doi:10.3837/tiis.2018.09.016 fatcat:u4tsra63pbg55lmxp2pwqzsaae

AR-Net: Adaptive Frame Resolution for Efficient Action Recognition [article]

Yue Meng, Chung-Ching Lin, Rameswar Panda, Prasanna Sattigeri, Leonid Karlinsky, Aude Oliva, Kate Saenko, Rogerio Feris
2020 arXiv   pre-print
Specifically, given a video frame, a policy network is used to decide what input resolution should be used for processing by the action recognition model, with the goal of improving both accuracy and efficiency  ...  In this paper, we propose a novel approach, called AR-Net (Adaptive Resolution Network), that selects on-the-fly the optimal resolution for each frame conditioned on the input for efficient action recognition  ...  Government is authorized to reproduce and distribute reprints for  ... 
arXiv:2007.15796v1 fatcat:ansb2ji3vnexbnn56v5sqed2ii

Coarse to Fine Multi-Resolution Temporal Convolutional Network [article]

Dipika Singhania, Rahul Rahaman, Angela Yao
2021 arXiv   pre-print
In particular, the decoder follows a coarse-to-fine structure with an implicit ensemble of multiple temporal resolutions.  ...  Temporal convolutional networks (TCNs) are a commonly used architecture for temporal video segmentation.  ...  refer to it as a coarse-to-fine ensemble (C2F ensemble).  ... 
arXiv:2105.10859v1 fatcat:6a5zp6dn5rhw3gffys7pejugjm

Cooking in the kitchen: Recognizing and Segmenting Human Activities in Videos [article]

Hilde Kuehne and Juergen Gall and Thomas Serre
2016 arXiv   pre-print
As research on action recognition matures, the focus is shifting away from categorizing basic task-oriented actions using hand-segmented video datasets to understanding complex goal-oriented daily human  ...  Here, we describe an end-to-end generative approach from the encoding of features to the structural modeling of complex human activities by applying Fisher vectors and temporal models for the analysis  ...  We further describe a novel generative framework for the analysis of temporal structures.  ... 
arXiv:1508.06073v2 fatcat:lcp4jr44p5f2zkf3z5cv4kiub4

Coarse-Fine Convolutional Deep-Learning Strategy for Human Activity Recognition

Carlos Avilés-Cruz, Andrés Ferreyra-Ramírez, Arturo Zúñiga-López, Juan Villegas-Cortéz
2019 Sensors  
The whole CNN scheme is based on a feature fusion of a fine-CNN, a medium-CNN, and a coarse-CNN.  ...  This paper presents a novel framework to classify and analyze human activities. A new convolutional neural network (CNN) strategy is applied to a single user movement recognition using a smartphone.  ...  It seems that the fusion of partial information given by fine-CNN, medium-CNN, and coarse-CNN makes it possible to obtain a 100% of good classification for HAR activities.  ... 
doi:10.3390/s19071556 fatcat:kvrhudjoh5glvjilgmslxypy7a

ActionCLIP: A New Paradigm for Video Action Recognition [article]

Mengmeng Wang, Jiazheng Xing, Yong Liu
2021 arXiv   pre-print
The canonical approach to video action recognition dictates a neural model to do a classic and standard 1-of-N majority vote task.  ...  Moreover, to handle the deficiency of label texts and make use of tremendous web data, we propose a new paradigm based on this multimodal learning framework for action recognition, which we dub "pre-train  ...  Acknowledgements We would like to thank Zeyi Huang for his constructive suggestions and comments on this work.  ... 
arXiv:2109.08472v1 fatcat:dwtb4xtf6bcbfmbjq5eif5hzgi

Character index

2011 2011 IEEE International Conference on Multimedia and Expo  
doi:10.1109/icme.2011.6011827 fatcat:wjy7yvkmvbbf3hj4wbyjapx5gu

Exploiting multi-level parallelism for low-latency activity recognition in streaming video

Ming-yu Chen, Lily Mummert, Padmanabhan Pillai, Alexander Hauptmann, Rahul Sukthankar
2010 Proceedings of the first annual ACM SIGMM conference on Multimedia systems - MMSys '10  
Figure 1 : Activity recognition on Gatwick airport video. Our system recognizes actions in full frame rate video with low latencies to enable interactive surveillance applications.  ...  Video understanding is a computationally challenging task that is critical not only for traditionally throughput-oriented applications such as search but also latency-sensitive interactive applications  ...  Second, we outline a general framework for enabling low-latency processing of full frame rate video that exploits both the coarse-and finegrained parallelism inherent in typical multimedia understanding  ... 
doi:10.1145/1730836.1730838 dblp:conf/mmsys/ChenMPHS10 fatcat:do5bppidmfbbneiwjvh3t2pnei

Human Action Recognition with Deep Temporal Pyramids [article]

Ahmed Mazari, Hichem Sahbi
2019 arXiv   pre-print
Top levels of this hierarchy are dedicated to coarse categories while deep levels are more suitable to fine-grained ones.  ...  In this paper, we introduce a novel hierarchical aggregation design, for final pooling, that controls granularity of the learned representations w.r.t the actual granularity of action categories.  ...  efficient GPU resources and reasonable size videos.  ... 
arXiv:1905.00745v1 fatcat:fnkqvrkrnngxhbjbv4uzgyzmsm

Artificial Intelligence for Text-Based Vehicle Search, Recognition, and Continuous Localization in Traffic Videos

Karen Panetta, Landry Kezebou, Victor Oludare, James Intriligator, Sos Agaian
2021 AI  
Additionally, to the best of our knowledge, no metrics currently exist for evaluating the robustness and performance efficiency of a vehicle recognition model on live videos and even less so for vehicle  ...  facilitate finer-grain recognition with color information; and (b) a Vehicle Recognition in Video (VRiV) dataset, a first of its kind video testbench dataset for evaluating the performance of vehicle  ...  [34] proposed using coarse-to-fine convolutional neural network architecture for achieving fine-grained vehicle model and make recognition.  ... 
doi:10.3390/ai2040041 fatcat:y5uegij4fvh7tl5cpplugsrxci

Mobile Video Action Recognition [article]

Yuqi Huo, Xiaoli Xu, Yao Lu, Yulei Niu, Zhiwu Lu, Ji-Rong Wen
2019 arXiv   pre-print
By employing MobileNetV2 as backbone, we propose a novel Temporal Trilinear Pooling (TTP) module to fuse the multiple modalities for mobile video action recognition.  ...  Video action recognition, which is topical in computer vision and video analysis, aims to allocate a short video clip to a pre-defined category such as brushing hair or climbing stairs.  ...  According to the above considerations, we propose a lightweight framework to solve the mobile video action recognition task.  ... 
arXiv:1908.10155v1 fatcat:kl2wpfemezcuxfc3snyp6bpshy

MGSampler: An Explainable Sampling Strategy for Video Action Recognition [article]

Yuan Zhi, Zhan Tong, Limin Wang, Gangshan Wu
2021 arXiv   pre-print
Frame sampling is a fundamental problem in video action recognition due to the essential redundancy in time and limited computation resources.  ...  First, we present two different motion representations to enable us to efficiently distinguish the motion-salient frames from the background.  ...  The first author would like to thank Ziteng Gao and Liwei Jin for their valuable suggestions.  ... 
arXiv:2104.09952v2 fatcat:ufhwtquz2jdilddviyzn54rqym
« Previous Showing results 1 — 15 out of 5,332 results