36,177 Hits in 4.1 sec

LiteEval: A Coarse-to-Fine Framework for Resource Efficient Video Recognition [article]

Zuxuan Wu, Caiming Xiong, Yu-Gang Jiang, Larry S. Davis
2019 arXiv   pre-print
This paper presents LiteEval, a simple yet effective coarse-to-fine framework for resource efficient video recognition, suitable for both online and offline scenarios.  ...  video frames at a finer scale to obtain more details.  ...  In this spirit, we explore the problem of dynamically allocating computational resources for video recognition.  ... 
arXiv:1912.01601v1 fatcat:lgazytnsorgannmagbehx6apde

Fast sign language recognition benefited from low rank approximation

Hanjie Wang, Xiujuan Chai, Yu Zhou, Xilin Chen
2015 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG)  
With the key frame selection and the variant number of hidden states determination, an advanced framework based on HMMs for Sign Language Recognition (SLR) is proposed, which is denoted as Light-HMMs (  ...  This paper proposes a framework based on the Hidden Markov Models (HMMs) benefited from the low rank approximation of the original sign videos for two aspects.  ...  For example, on a datasets with vocabulary of twelve with depth cue, Kurakin et al. [12] proposed a realtime system for dynamic hand gesture recognition on ASL.  ... 
doi:10.1109/fg.2015.7163092 dblp:conf/fgr/WangCZC15 fatcat:cfdsubndrrhuhfhpbxnmox6cba

Distributed Face Recognition Based on Load Balancing and Dynamic Prediction

Fangyuan Zou, Jing Li, Weidong Min
2019 Applied Sciences  
To this end, a new distributed face recognition framework based on load balancing and dynamic prediction is proposed in this paper. The framework consists of a server and multiple agents.  ...  With the dramatic expansion of large-scale videos, traditional centralized face recognition methods cannot meet the demands of time efficiency and expansibility, and thus distributed face recognition models  ...  In view of these problems, a distributed face recognition framework based on load balancing and dynamic prediction is proposed in this paper.  ... 
doi:10.3390/app9040794 fatcat:n7a7izstezdg3p3kz72qrf7e3y

2D or not 2D? Adaptive 3D Convolution Selection for Efficient Video Recognition [article]

Hengduo Li, Zuxuan Wu, Abhinav Shrivastava, Larry S. Davis
2021 arXiv   pre-print
3D convolutional networks are prevalent for video recognition.  ...  Exploiting large variations among different videos, we introduce Ada3D, a conditional computation framework that learns instance-specific 3D usage policies to determine frames and convolution layers to  ...  Then, conditioned on the derived policies, dynamic inference is performed on a pretrained 3D network with selected frames and 3D convolutions for fast recognition.  ... 
arXiv:2012.14950v2 fatcat:glt4jkxn5zcdhl2kls35ax6bia

Listen to Look: Action Recognition by Previewing Audio [article]

Ruohan Gao, Tae-Hyun Oh, Kristen Grauman, Lorenzo Torresani
2020 arXiv   pre-print
We propose a framework for efficient action recognition in untrimmed video that uses audio as a preview mechanism to eliminate both short-term and long-term visual redundancies.  ...  redundancy for efficient video-level recognition.  ...  Acknowledgements: Thanks to Bruno Korbar, Zuxuan Wu, and Wenhao Wu for help with experiments and to Weiyao Wang, Du Tran, and the UT Austin vision group for helpful discussions.  ... 
arXiv:1912.04487v3 fatcat:w3smjfakfze4pcg7wc2iobvmja

View-invariant Deep Architecture for Human Action Recognition using late fusion [article]

Chhavi Dhiman, Dinesh Kumar Vishwakarma
2019 arXiv   pre-print
Human action Recognition for unknown views is a challenging task.  ...  We propose a view-invariant deep human action recognition framework, which is a novel integration of two important action cues: motion and shape temporal dynamics (STD).  ...  Therefore, we propose a view invariant two-stream deep human action recognition framework, which is a fusion of Shape Temporal Dynamic (STD) stream and motion stream.  ... 
arXiv:1912.03632v1 fatcat:uf7uvzm7qff7tdecrmlpn2hkxi

A Real-time Fire Detection Model Based on Cascade Strategy

Jing Wu, Chunxue Wu, Yan Wu
2018 International Journal of Software & Hardware Research in Engineering  
In this paper, a fire detection model, which is as fast as possible and can be applied to real-time and effective detection of smoke and flame in video surveillance, is present.  ...  Firstly, the model locates the dynamic foreground object in the frame of the video, and obtains the areas where smoke or flame may exist, which greatly reduces the number of Regions of Interest (RoIs).  ...  ACKNOWLEDGMENTS The authors would like to appreciate all anonymous reviewers for their insightful comments and constructive suggestions to polish this paper in high quality.  ... 
doi:10.26821/ijshre.6.11.2018.61107 fatcat:if5l3modknec3inuyobwv275hy

Fast and Robust Dynamic Hand Gesture Recognition via Key Frames Extraction and Feature Fusion [article]

Hao Tang, Hong Liu, Wei Xiao, Nicu Sebe
2019 arXiv   pre-print
To bridge it, this work combines image entropy and density clustering to exploit the key frames from hand gesture video for further feature extraction, which can improve the efficiency of recognition.  ...  Gesture recognition is a hot topic in computer vision and pattern recognition, which plays a vitally important role in natural human-computer interface.  ...  Acknowledgments This work is partially supported by National Natural Science Foundation of China (NSFC, U1613209), Shenzhen Key Laboratory for Intelligent Multimedia and Virtual Reality (ZDSYS201703031405467  ... 
arXiv:1901.04622v1 fatcat:kfyh3ucipfh57myhtcvsw2xkka

Video-Based Face Recognition: State of the Art [chapter]

Zhaoxiang Zhang, Chao Wang, Yunhong Wang
2011 Lecture Notes in Computer Science  
Face recognition in videos is a hot topic in computer vision and biometrics over many years.  ...  Related to applications, we divide the existing video based face recognition approaches into two categories: video-image based methods and video-video based methods, which are surveyed and analyzed in  ...  One solution is based on frame selection.  ... 
doi:10.1007/978-3-642-25449-9_1 fatcat:c2z4v4arlzeelhw67ogg7jocwq

Local Fast R-CNN Flow for Object-Centric Event Recognition in Complex Traffic Scenes [chapter]

Qin Gu, Jianyu Yang, Wei Qi Yan, Yanqiang Li, Reinhard Klette
2018 Lecture Notes in Computer Science  
We propose a novel event-recognition framework using deep local flow in a fast regionbased convolutional neural network (R-CNN).  ...  Second, a deep belief propagation method is proposed for the calculation of local fast R-CNN flow (LFRCF) between local convolutional feature matrices of two non-adjacent frames in a sequence.  ...  In this paper, we use a fast R-CNN framework for multi-scale object detection and event hypothesis generation.  ... 
doi:10.1007/978-3-319-92753-4_34 fatcat:h3nyn3jzrzct7fa5orim7mkaqe

Dense SIFT–Flow based Architecture for Recognizing Hand Gestures

Suni S S, K Gopakumar
2020 Advances in Science, Technology and Engineering Systems  
Initially, a combination of three frames differencing and skin filtering technique is used for hand detection to reduce the computational complexity followed by a SIFT flow technique to extract the features  ...  This leads to the motivation in developing a dense Scale Invariant Feature Transform (SIFT) flow based architecture for recognizing dynamic hand gestures.  ...  A framework is developed for dynamic hand gesture recognition, wherein multimodal behavior of the image sequences in terms of spatial coherence and motion is extracted.  ... 
doi:10.25046/aj0505115 fatcat:5hqcvar2vvedfibwcqnn6scsxy

A Survey Of Activity Recognition And Understanding The Behavior In Video Survelliance [article]

A. R. Revathi, Dhananjay Kumar
2012 arXiv   pre-print
This paper presents a review of human activity recognition and behaviour understanding in video sequence.  ...  It describes techniques that use to define a general set of activities that are applicable to a wide range of scenes and environments in video sequence.  ...  Rajagopal, A Shape Based Object Classification for Automated Video Surveillance with Feature Selection [8]. 2012 Robert Sorschag A Flexible Object-of-Interest Annotation Framework for Online Video  ... 
arXiv:1207.6774v1 fatcat:zau7izoouffe7livk6zvxrn5sq

Dynamic Sampling Networks for Efficient Action Recognition in Videos [article]

Yin-Dong Zheng, Zhaoyang Liu, Tong Lu, Limin Wang
2020 arXiv   pre-print
To address these issues, we propose a new framework for action recognition in videos, called Dynamic Sampling Networks (DSN), by designing a dynamic sampling module to improve the discriminative power  ...  In particular, given an input video, we train an observation network in an associative reinforcement learning setting to maximize the rewards of the selected clips with a correct prediction.  ...  Dynamic clip sampling for action recognition. DSN framework dynamically selects a subset of clips from video for action recognition.  ... 
arXiv:2006.15560v1 fatcat:um5osmhlcvfbhba5k2ucepqxe4

SCAR: Dynamic Adaptation for Person Detection and Persistence Analysis in Unconstrained Videos [chapter]

George Kamberov, Matt Burlick, Lazaros Karydas, Olga Koteoglou
2012 Lecture Notes in Computer Science  
We describe a new framework for detection and persistence analysis in noisy and cluttered videos.  ...  In many forensic and data analytics applications there is a need to detect whether and for how long a specific person is present in a video.  ...  In addition, we evaluated a baseline method Picasa BL for "detection, recognition, and tracking by per-frame detection and recognition".  ... 
doi:10.1007/978-3-642-33191-6_18 fatcat:ymhnskauajaalfhtchvkh4igci

Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition [article]

Shuyang Sun, Zhanghui Kuang, Wanli Ouyang, Lu Sheng, Wei Zhang
2018 arXiv   pre-print
a fast and robust approach.  ...  Motion representation plays a vital role in human action recognition in videos.  ...  Concurrent with our work, another state-of-the-art method applies a strategy called ranked pool [13] that generates a fast video-level descriptor, namely, the dynamic images [3] .  ... 
arXiv:1711.11152v2 fatcat:fst5sfspzncp3mxzprhks2ktka
« Previous Showing results 1 — 15 out of 36,177 results