A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
LiteEval: A Coarse-to-Fine Framework for Resource Efficient Video Recognition
[article]
2019
arXiv
pre-print
This paper presents LiteEval, a simple yet effective coarse-to-fine framework for resource efficient video recognition, suitable for both online and offline scenarios. ...
This is achieved by a coarse LSTM and a fine LSTM operating cooperatively, as well as a conditional gating module to learn when to allocate more computation. ...
We introduce LITEEVAL, a resource-efficient framework suitable for both online and offline video classification, which adaptively assigns computational resources to incoming video frames. ...
arXiv:1912.01601v1
fatcat:lgazytnsorgannmagbehx6apde
Moving object detection with Deep CNNs
2020
IEEE Access
This paper proposes a novel framework that consists of a coarse-grained detection as well as a fine-grained detection. ...
To the best of our knowledge, this is the first work that proposes a coarse-tofine grained framework to detect moving objects on high-resolution scenes. ...
We have presented a coarse-to-fine grained framework and evaluated its effectiveness with extensive experiments. ...
doi:10.1109/access.2020.2972562
fatcat:r6a3t2gfvvarlkhezbn64aros4
Attention-driven action retrieval with DTW-based 3d descriptor matching
2008
Proceeding of the 16th ACM international conference on Multimedia - MM '08
This paper presents a content-based action retrieval framework to enable effective search of near-duplicated actions in large-scale video database. ...
From visual perception viewpoint, actions in videos can capture high-level semantics for video content understanding and retrieval. ...
This coarse-to-fine solution ensures our method to be still efficient in large-scale database without losing too much accuracy. ...
doi:10.1145/1459359.1459443
dblp:conf/mm/JiSYXLL08
fatcat:afopnoufqnfwho6ijzkqzyy3yy
Binary Hashing CNN Features for Action Recognition
2018
KSII Transactions on Internet and Information Systems
The purpose of this work is to solve the problem of representing an entire video using Convolutional Neural Network (CNN) features for human action recognition. ...
A typical method is to use sampled video frames as inputs and corresponding labels as supervision. ...
The authors would like to thank the anonymous referees for their valuable comments and suggestions. ...
doi:10.3837/tiis.2018.09.016
fatcat:u4tsra63pbg55lmxp2pwqzsaae
AR-Net: Adaptive Frame Resolution for Efficient Action Recognition
[article]
2020
arXiv
pre-print
Specifically, given a video frame, a policy network is used to decide what input resolution should be used for processing by the action recognition model, with the goal of improving both accuracy and efficiency ...
In this paper, we propose a novel approach, called AR-Net (Adaptive Resolution Network), that selects on-the-fly the optimal resolution for each frame conditioned on the input for efficient action recognition ...
Government is authorized to reproduce and distribute reprints for ...
arXiv:2007.15796v1
fatcat:ansb2ji3vnexbnn56v5sqed2ii
Coarse to Fine Multi-Resolution Temporal Convolutional Network
[article]
2021
arXiv
pre-print
In particular, the decoder follows a coarse-to-fine structure with an implicit ensemble of multiple temporal resolutions. ...
Temporal convolutional networks (TCNs) are a commonly used architecture for temporal video segmentation. ...
refer to it as a coarse-to-fine ensemble (C2F ensemble). ...
arXiv:2105.10859v1
fatcat:6a5zp6dn5rhw3gffys7pejugjm
Cooking in the kitchen: Recognizing and Segmenting Human Activities in Videos
[article]
2016
arXiv
pre-print
As research on action recognition matures, the focus is shifting away from categorizing basic task-oriented actions using hand-segmented video datasets to understanding complex goal-oriented daily human ...
Here, we describe an end-to-end generative approach from the encoding of features to the structural modeling of complex human activities by applying Fisher vectors and temporal models for the analysis ...
We further describe a novel generative framework for the analysis of temporal structures. ...
arXiv:1508.06073v2
fatcat:lcp4jr44p5f2zkf3z5cv4kiub4
Coarse-Fine Convolutional Deep-Learning Strategy for Human Activity Recognition
2019
Sensors
The whole CNN scheme is based on a feature fusion of a fine-CNN, a medium-CNN, and a coarse-CNN. ...
This paper presents a novel framework to classify and analyze human activities. A new convolutional neural network (CNN) strategy is applied to a single user movement recognition using a smartphone. ...
It seems that the fusion of partial information given by fine-CNN, medium-CNN, and coarse-CNN makes it possible to obtain a 100% of good classification for HAR activities. ...
doi:10.3390/s19071556
fatcat:kvrhudjoh5glvjilgmslxypy7a
ActionCLIP: A New Paradigm for Video Action Recognition
[article]
2021
arXiv
pre-print
The canonical approach to video action recognition dictates a neural model to do a classic and standard 1-of-N majority vote task. ...
Moreover, to handle the deficiency of label texts and make use of tremendous web data, we propose a new paradigm based on this multimodal learning framework for action recognition, which we dub "pre-train ...
Acknowledgements We would like to thank Zeyi Huang for his constructive suggestions and comments on this work. ...
arXiv:2109.08472v1
fatcat:dwtb4xtf6bcbfmbjq5eif5hzgi
Character index
2011
2011 IEEE International Conference on Multimedia and Expo
MOBILE AUGMENTED REALITY FOR BOOKS ON A SHELF Winston Hsu COARSE-TO-FINE TEMPORAL OPTIMIZATION FOR VIDEO RETARGETING BASED ON SEAM CARVING Jiwei Hu 09/08/2011 file://F:\HTML\Authors.html Page 56 of 155 ...
Ding COARSE-TO-FINE TEMPORAL OPTIMIZATION FOR VIDEO RETARGETING BASED ON SEAM CARVING Peng Ding COMMERCIAL DETECTION BY MINING MAXIMAL REPEATED SEQUENCE IN AUDIO STREAM Claudio Diniz SHBS: A HEURISTIC ...
doi:10.1109/icme.2011.6011827
fatcat:wjy7yvkmvbbf3hj4wbyjapx5gu
Exploiting multi-level parallelism for low-latency activity recognition in streaming video
2010
Proceedings of the first annual ACM SIGMM conference on Multimedia systems - MMSys '10
Figure 1 : Activity recognition on Gatwick airport video. Our system recognizes actions in full frame rate video with low latencies to enable interactive surveillance applications. ...
Video understanding is a computationally challenging task that is critical not only for traditionally throughput-oriented applications such as search but also latency-sensitive interactive applications ...
Second, we outline a general framework for enabling low-latency processing of full frame rate video that exploits both the coarse-and finegrained parallelism inherent in typical multimedia understanding ...
doi:10.1145/1730836.1730838
dblp:conf/mmsys/ChenMPHS10
fatcat:do5bppidmfbbneiwjvh3t2pnei
Human Action Recognition with Deep Temporal Pyramids
[article]
2019
arXiv
pre-print
Top levels of this hierarchy are dedicated to coarse categories while deep levels are more suitable to fine-grained ones. ...
In this paper, we introduce a novel hierarchical aggregation design, for final pooling, that controls granularity of the learned representations w.r.t the actual granularity of action categories. ...
efficient GPU resources and reasonable size videos. ...
arXiv:1905.00745v1
fatcat:fnkqvrkrnngxhbjbv4uzgyzmsm
Artificial Intelligence for Text-Based Vehicle Search, Recognition, and Continuous Localization in Traffic Videos
2021
AI
Additionally, to the best of our knowledge, no metrics currently exist for evaluating the robustness and performance efficiency of a vehicle recognition model on live videos and even less so for vehicle ...
facilitate finer-grain recognition with color information; and (b) a Vehicle Recognition in Video (VRiV) dataset, a first of its kind video testbench dataset for evaluating the performance of vehicle ...
[34] proposed using coarse-to-fine convolutional neural network architecture for achieving fine-grained vehicle model and make recognition. ...
doi:10.3390/ai2040041
fatcat:y5uegij4fvh7tl5cpplugsrxci
Mobile Video Action Recognition
[article]
2019
arXiv
pre-print
By employing MobileNetV2 as backbone, we propose a novel Temporal Trilinear Pooling (TTP) module to fuse the multiple modalities for mobile video action recognition. ...
Video action recognition, which is topical in computer vision and video analysis, aims to allocate a short video clip to a pre-defined category such as brushing hair or climbing stairs. ...
According to the above considerations, we propose a lightweight framework to solve the mobile video action recognition task. ...
arXiv:1908.10155v1
fatcat:kl2wpfemezcuxfc3snyp6bpshy
MGSampler: An Explainable Sampling Strategy for Video Action Recognition
[article]
2021
arXiv
pre-print
Frame sampling is a fundamental problem in video action recognition due to the essential redundancy in time and limited computation resources. ...
First, we present two different motion representations to enable us to efficiently distinguish the motion-salient frames from the background. ...
The first author would like to thank Ziteng Gao and Liwei Jin for their valuable suggestions. ...
arXiv:2104.09952v2
fatcat:ufhwtquz2jdilddviyzn54rqym
« Previous
Showing results 1 — 15 out of 5,332 results