14,188 Hits in 6.6 sec

Hierarchical Contrastive Motion Learning for Video Action Recognition [article]

Xitong Yang, Xiaodong Yang, Sifei Liu, Deqing Sun, Larry Davis, Jan Kautz
2022 arXiv   pre-print
One central question for video action recognition is how to model motion.  ...  In this paper, we present hierarchical contrastive motion learning, a new self-supervised learning framework to extract effective motion representations from raw video frames.  ...  Joint Training for Action Recognition Our ultimate goal is to improve video action recognition with the learned hierarchical motion features.  ... 
arXiv:2007.10321v3 fatcat:vd6n6rluavcbdi3e7k4wge2ije

Deep hierarchical pooling design for cross-granularity action recognition [article]

Ahmed Mazari, Hichem Sahbi
2020 arXiv   pre-print
In this paper, we introduce a novel hierarchical aggregation design that captures different levels of temporal granularity in action recognition.  ...  Besides being principled and well grounded, the proposed hierarchical pooling is also video-length agnostic and resilient to misalignments in actions.  ...  CONCLUSION In this paper, we introduced a hierarchical aggregation design for cross-granularity action recognition.  ... 
arXiv:2006.04473v1 fatcat:wgn72n7jgbbgvewkjkwv7mwvwq

Hi-EADN: Hierarchical Excitation Aggregation and Disentanglement Frameworks for Action Recognition Based on Videos

Zeyuan Hu, Eung-Joo Lee
2021 Symmetry  
Most existing video action recognition methods mainly rely on high-level semantic information from convolutional neural networks (CNNs) but ignore the discrepancies of different information streams.  ...  hierarchical disentanglement (SEHD) module.  ...  Hierarchical Disentanglement GAP Global Average Pooling SA multi-head self-guided Attention  ... 
doi:10.3390/sym13040662 fatcat:6mjobgpymbcgncus6z3za3saam

Video Representation Learning with Visual Tempo Consistency [article]

Ceyuan Yang, Yinghao Xu, Bo Dai, Bolei Zhou
2020 arXiv   pre-print
We propose to maximize the mutual information between representations of slow and fast videos via hierarchical contrastive learning (VTHCL).  ...  Video representations learned from VTHCL achieve the competitive performances under the self-supervision evaluation protocol for action recognition on UCF-101 (82.1\%) and HMDB-51 (49.2\%).  ...  Acknowledgments We thank Zhirong Wu and Yonglong Tian for their public implementation of previous works.  ... 
arXiv:2006.15489v2 fatcat:dem4jafa6naavksvkgf2b3l6ca

Hierarchical Attention Network for Action Recognition in Videos [article]

Yilin Wang, Suhang Wang, Jiliang Tang, Neil O'Hare, Yi Chang, Baoxin Li
2016 arXiv   pre-print
structures for complex human action understanding.  ...  In this paper we propose a novel approach named Hierarchical Attention Network (HAN), which enables to incorporate static spatial information, short-term motion information and long-term video temporal  ...  In this paper, we study the problem of video representation learning for action recognition.  ... 
arXiv:1607.06416v1 fatcat:6rfajs2f7rcidj7fwq3bypl3uq


Ankush Rai, Jagadeesh Kannan R
2017 Asian Journal of Pharmaceutical and Clinical Research  
Next, hierarchical recognition approaches for abnormal action states are introduced and looked at.  ...  Statistics based methodologies, syntactic methodologies, and description based methodologies for hierarchical recognition is examined in the paper.  ...  For instance non-hierarchical single layer methodologies can be effortlessly used for low-level or nuclear action recognition, for example, motion location.  ... 
doi:10.22159/ajpcr.2017.v10s1.19977 fatcat:pzroxovz75bkpilavebs2ig6fm

Recent advances in video-based human action recognition using deep learning: A review

Di Wu, Nabin Sharma, Michael Blumenstein
2017 2017 International Joint Conference on Neural Networks (IJCNN)  
This paper presents a review of various state-of-theart deep learning-based techniques proposed for human action recognition on the three types of datasets.  ...  There are many challenges involved in human action recognition in videos, such as cluttered backgrounds, occlusions, viewpoint variation, execution rate, and camera motion.  ...  In contrast, the paper reviews the recent developments in the use of deep learning techniques which have been applied in the human action recognition research area.  ... 
doi:10.1109/ijcnn.2017.7966210 dblp:conf/ijcnn/WuSB17 fatcat:f35o5nkxozfsrew2sgtaybofla

Action Recognition with Deep Multiple Aggregation Networks [article]

Ahmed Mazari, Hichem Sahbi
2020 arXiv   pre-print
In this paper, we introduce a novel hierarchical pooling design that captures different levels of temporal granularity in action recognition.  ...  Besides being principled and well grounded, the proposed hierarchical pooling is also video-length and resolution agnostic.  ...  CONCLUSION We introduce in this paper a temporal pyramid approach for video action recognition.  ... 
arXiv:2006.04489v1 fatcat:ybavmgy33ffflnefb4sc6z4cpu

Recognizing activities with cluster-trees of tracklets

Adrien Gaidon, Zaid Harchaoui, Cordelia Schmid
2012 Procedings of the British Machine Vision Conference 2012  
Contrary to most approaches based on action decompositions, we propose to use the full hierarchical action structure instead of selecting a small fixed number of parts.  ...  We represent a video as a hierarchy of mid-level motion components. This hierarchy is a data-driven decomposition specific to each video.  ...  Figure 1 ), in order to build a hierarchical model of the motion content of a video. This is in contrast to existing approaches [39] that view videos as a bag of clusters.  ... 
doi:10.5244/c.26.30 dblp:conf/bmvc/GaidonHS12 fatcat:c3tgkymblvecdpvbk77f4n6hiq

Hierarchical Self-supervised Representation Learning for Movie Understanding [article]

Fanyi Xiao, Kaustav Kundu, Joseph Tighe, Davide Modolo
2022 arXiv   pre-print
Most self-supervised video representation learning approaches focus on action recognition.  ...  In contrast, in this paper we focus on self-supervised video learning for movie understanding and propose a novel hierarchical self-supervised pretraining strategy that separately pretrains each level  ...  For example, they propose models that encourage the learning of shortterm appearance and motion cues, as these are the most informative for action recognition.  ... 
arXiv:2204.03101v1 fatcat:kl2xwoczfzedvd5tx452ecg2le

Action Recognition and Localization by Hierarchical Space-Time Segments

Shugao Ma, Jianming Zhang, Nazli Ikizler-Cinbis, Stan Sclaroff
2013 2013 IEEE International Conference on Computer Vision  
We propose Hierarchical Space-Time Segments as a new representation for action recognition and localization. This representation has a two level hierarchy.  ...  Using simple linear SVM on the resultant bag of hierarchical space-time segments representation, we attain better than, or comparable to, state-of-art action recognition performance on two challenging  ...  Video Frame Hierarchical Segmentation For human action recognition, segments in a video frame that contain motion are useful as they may belong to moving body parts.  ... 
doi:10.1109/iccv.2013.341 dblp:conf/iccv/MaZIS13 fatcat:mdcwpudgqbhvjj375tvspn73tu

HMS: Hierarchical Modality Selection for Efficient Video Recognition [article]

Zejia Weng, Zuxuan Wu, Hengduo Li, Yu-Gang Jiang
2021 arXiv   pre-print
This paper introduces Hierarchical Modality Selection (HMS), a simple yet efficient multimodal learning framework for efficient video recognition.  ...  Videos are multimodal in nature. Conventional video recognition pipelines typically fuse multimodal features for improved performance.  ...  In contrast to conventional video recognition approaches that leverage multimodal features for all samples, we learn what modalities to use on a per-input basis.  ... 
arXiv:2104.09760v2 fatcat:js2whnimvvbhfp5uzenqu3mlvq

Human Action Recognition Using HDP by Integrating Motion and Location Information [chapter]

Yasuo Ariki, Takuya Tonaru, Tetsuya Takiguchi
2010 Lecture Notes in Computer Science  
The proposed method, unsupervised MI-HDP-LDA, was evaluated for Weizmann dataset.  ...  These are unsupervised learning, but they require the number of latent topics to be set manually.  ...  In the experiments of motion learning and recognition for Weismann Dataset, LDA showed 61.8% recognition rate using only motion information.  ... 
doi:10.1007/978-3-642-12304-7_28 fatcat:qtndtwio75gefm7hm36xseq4s4

Language-Motivated Approaches to Action Recognition [chapter]

Manavender R. Malgireddy, I. Nwogu, V. Govindaraju
2017 Gesture Recognition  
In order to obtain statistical insight into the underlying patterns of motions in activities, we develop a dynamic, hierarchical Bayesian model which connects low-level visual features in videos with poses  ...  We also introduce a probabilistic framework for detecting and localizing pre-specified activities (or gestures) in a video sequence, analogous to the use of filler models for keyword detection in speech  ...  Acknowledgments The authors wish to thank the associate editors and anonymous referees for all their advice about the structure, references, experimental illustration and interpretation of this manuscript  ... 
doi:10.1007/978-3-319-57021-1_5 fatcat:byt4ayc6nrcyfjh4o2av2btmae

A Hierarchical Representation for Future Action Prediction [chapter]

Tian Lan, Tsung-Chuan Chen, Silvio Savarese
2014 Lecture Notes in Computer Science  
We develop a max-margin learning framework for future action prediction, integrating a collection of moveme detectors in a hierarchical way.  ...  We consider inferring the future actions of people from a still image or a short video clip.  ...  We consider at most 3 pose types for each motion segment. Learning a Collection of Moveme Classifiers Given a hierarchy of movemes, we learn a classifier for each moveme in the hierarchy.  ... 
doi:10.1007/978-3-319-10578-9_45 fatcat:77eleukzgzbvbe6rkmheipu7ey
« Previous Showing results 1 — 15 out of 14,188 results