1,562 Hits in 5.5 sec

TAN: Temporal Aggregation Network for Dense Multi-label Action Recognition [article]

Xiyang Dai, Bharat Singh, Joe Yue-Hei Ng, Larry S. Davis
2018 arXiv   pre-print
Experiments show that our model is well suited for dense multi-label action recognition, which is a challenging sub-topic of action recognition that requires predicting multiple action labels in each frame  ...  We present Temporal Aggregation Network (TAN) which decomposes 3D convolutions into spatial and temporal aggregation blocks.  ...  Our approach is suitable for dense multi-label action recognition because we learn dense spatio-temporal information efficiently without the need to reduce temporal resolution.  ... 
arXiv:1812.06203v1 fatcat:7allnptupzalbnxdoshweo5rpu

A Real-time Action Representation with Temporal Encoding and Deep Compression [article]

Kun Liu, Wu Liu, Huadong Ma, Mingkui Tan, Chuang Gan
2020 arXiv   pre-print
To address this challenge, we propose a new real-time convolutional architecture, called Temporal Convolutional 3D Network (T-C3D), for action representation.  ...  Deep neural networks have achieved remarkable success for video-based action recognition. However, most of existing approaches cannot be deployed in practice due to the high computational cost.  ...  This work is partially supported by the Funds for International Cooperation and Exchange of the National Natural Science Foundation of China (No. 61720106007), and the Funds for Creative Research Groups  ... 
arXiv:2006.09675v1 fatcat:pkk2n7ud4jcqniqj5idolxqn5y

Object Affordances Graph Network for Action Recognition

Haoliang Tan, Le Wang, Qilin Zhang, Zhanning Gao, Nanning Zheng, Gang Hua
2019 British Machine Vision Conference  
With the spatio-temporal co-occurrences between human and objects captured, the Object Affordances Graph Network (OAGN) is subsequently proposed.  ...  To provide a fair evaluation of the role that object affordances could play on human action recognition, we have assembled a new dataset with additional annotated object bounding-boxes to account for human-object  ...  These two features are subsequently concatenated as the final video representation for action category prediction. We use sigmoid-based classifier for multi-label prediction.  ... 
dblp:conf/bmvc/TanWZGZH19 fatcat:qkvynakctbhptlryqcxtaokudu

Joint Learning On The Hierarchy Representation for Fine-Grained Human Action Recognition [article]

Mei Chee Leong, Hui Li Tan, Haosong Zhang, Liyuan Li, Feng Lin, Joo Hwee Lim
2021 arXiv   pre-print
Inspired by the recently proposed hierarchy representation of fine-grained actions in FineGym and SlowFast network for action recognition, we propose a novel multi-task network which exploits the FineGym  ...  The multi-task network consists of three pathways of SlowOnly networks with gradually increased frame rates for events, sets and elements of fine-grained actions, followed by our proposed integration layers  ...  TRN [9] introduces a temporal relational reasoning module that allows aggregation of multi-scale temporal relations between frames.  ... 
arXiv:2110.05853v1 fatcat:z3zg3eam3jdeli2tkqhh7bdrhy

Dynamic Inference: A New Approach Toward Efficient Video Action Recognition [article]

Wenhao Wu, Dongliang He, Xiao Tan, Shifeng Chen, Yi Yang, Shilei Wen
2020 arXiv   pre-print
Though action recognition in videos has achieved great success recently, it remains a challenging task due to the massive computational cost.  ...  Designing lightweight networks is a possible solution, but it may degrade the recognition performance.  ...  In [13] , the authors propose to adaptively determine the network depth for different images and a multi-scale dense network is designed for image classification.  ... 
arXiv:2002.03342v1 fatcat:7fu3u6hmirg4dp7jam2yfaznmm

MoViNets: Mobile Video Networks for Efficient Video Recognition [article]

Dan Kondratyuk, Liangzhe Yuan, Yandong Li, Li Zhang, Mingxing Tan, Matthew Brown, Boqing Gong
2021 arXiv   pre-print
We present Mobile Video Networks (MoViNets), a family of computation and memory efficient video networks that can operate on streaming video for online inference. 3D convolutional neural networks (CNNs  ...  These three progressive techniques allow MoViNets to achieve state-of-the-art accuracy and efficiency on the Kinetics, Moments in Time, and Charades video action recognition datasets.  ...  for human action recognition.  ... 
arXiv:2103.11511v2 fatcat:gahsdk5ep5edtn5ubymvlwli4u

Relaxed Transformer Decoders for Direct Action Proposal Generation [article]

Jing Tan, Jiaqi Tang, Limin Wang, Gangshan Wu
2021 arXiv   pre-print
This paper presents a simple and efficient framework (RTD-Net) for direct action proposal generation, by re-purposing a Transformer-alike architecture.  ...  Temporal action proposal generation is an important and challenging task in video understanding, which aims at detecting all temporal segments containing action instances of interest.  ...  In addition to provide semantic labels for trimmed videos, action recognition is also eligible for extracting snippet-level features in untrimmed videos, which are used in downstream tasks, such as temporal  ... 
arXiv:2102.01894v3 fatcat:lxzztsicyfgmtnd5a7rksukxsu

Deep 3D human pose estimation: A review

Jinbao Wang, Shujie Tan, Xiantong Zhen, Shuo Xu, Feng Zheng, Zhenyu He, Ling Shao
2021 Computer Vision and Image Understanding  
Our key idea is to use a multi-way matching algorithm to cluster the detected 2D poses in all views.  ...  However, there is still significant room for improvement.  ...  Specifically, to deal with occlusion, Cheng et al. (2020) apply data augmentation and multi-scale spatial features for 2D keypoints prediction in each frame, and multi-stride temporal convolutional networks  ... 
doi:10.1016/j.cviu.2021.103225 fatcat:hvlgjuxd2zfgji6k4y4g65cs7y

Coherent Loss: A Generic Framework for Stable Video Segmentation [article]

Mingyang Qian, Yi Fu, Xiao Tan, Yingying Li, Jinqing Qi, Huchuan Lu, Shilei Wen, Errui Ding
2020 arXiv   pre-print
Video segmentation approaches are of great importance for numerous vision tasks especially in video manipulation for entertainment.  ...  In particular, we propose a Coherent Loss with a generic framework to enhance the performance of a neural network against jittering artifacts, which combines with high accuracy and high consistency.  ...  (Huang et al. 2020 ) propose a new temporal aggregation network and a novel dynamic time-evolving template matching mechanism to achieve significantly improved performance. David et al.  ... 
arXiv:2010.13085v1 fatcat:hgra5ikjzfbchaj62h5n7mttam

Actor-Action Video Classification CSC 249/449 Spring 2020 Challenge Report [article]

Jing Shi, Zhiheng Li, Haitian Zheng, Yihang Xu, Tianyou Xiao, Weitao Tan, Xiaoning Guo, Sizhe Li, Bin Yang, Zhexin Xu, Ruitao Lin, Zhongkai Shangguan (+21 others)
2020 arXiv   pre-print
This technical report summarizes submissions and compiles from Actor-Action video classification challenge held as a final project in CSC 249/449 Machine Vision course (Spring 2020) at University of Rochester  ...  Two-stream convo- lutional networks for action recognition in videos, 2014.  ...  performing multi-label actor-action classification.  ... 
arXiv:2008.00141v2 fatcat:duben6xzr5ajhciyoczsyo23du

2020 Index IEEE Transactions on Image Processing Vol. 29

2020 IEEE Transactions on Image Processing  
., +, TIP 2020 5396-5407 Multi-Scale Multi-View Deep Feature Aggregation for Food Recognition.  ...  ., +, TIP 2020 1016-1029 Deep Image-to-Video Adaptation and Fusion Networks for Action Recognition. Liu, Y., +, TIP 2020 3168-3182 Deep Ranking for Image Zero-Shot Multi-Label Classification.  ... 
doi:10.1109/tip.2020.3046056 fatcat:24m6k2elprf2nfmucbjzhvzk3m

SegTAD: Precise Temporal Action Detection via Semantic Segmentation [article]

Chen Zhao, Merey Ramazanova, Mengmeng Xu, Bernard Ghanem
2022 arXiv   pre-print
To address these issues and precisely model temporal action detection, we formulate the task of temporal action detection in a novel perspective of semantic segmentation.  ...  Temporal action detection (TAD) is an important yet challenging task in video analysis.  ...  To this end, various tasks have emerged, for example, action recognition [17] , spatial-temporal action detection [53] , temporal action localization [24, 37] .  ... 
arXiv:2203.01542v1 fatcat:oyn4rg4wg5eivimd3tnc4wjswa

Weakly-Supervised Spatio-Temporal Anomaly Detection in Surveillance Video [article]

Jie Wu, Wei Zhang, Guanbin Li, Wenhao Wu, Xiao Tan, Yingying Li, Errui Ding, Liang Lin
2021 arXiv   pre-print
To address this challenging task, we propose a dual-branch network which takes as input the proposals with multi-granularities in both spatial-temporal domains.  ...  ., ST-UCF-Crime and STRA, consisting of videos containing spatio-temporal abnormal annotations to serve as the benchmarks for WSSTAD.  ...  Our ranking loss in MIL builds upon the existing observations that the learned concepts from the tube and temporal branch are often complementary to each other in action recognition and localization task  ... 
arXiv:2108.03825v1 fatcat:q4wkot3ywrc3pogbwuii5bqqzm

2020 Index IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 42

2021 IEEE Transactions on Pattern Analysis and Machine Intelligence  
., +, TPAMI Jan. 2020 203-220 2020 1981-1995 RefineNet: Multi-Path Refinement Networks for Dense Prediction.  ...  ., +, TPAMI May 2020 1218-1227 RefineNet: Multi-Path Refinement Networks for Dense Prediction. Lin, G., +, TPAMI May 2020 1228-1242 Robust RGB-D Face Recognition Using Attribute-Aware Loss.  ...  Object recognition Adversarial Action Prediction Networks. Kong, Y., +, TPAMI March 2020 539-553 Capturing the Geometry of Object Categories from Video Supervision.  ... 
doi:10.1109/tpami.2020.3036557 fatcat:3j6s2l53x5eqxnlsptsgbjeebe

Dynamic Spatio-Temporal Bag of Expressions (D-STBoE) Model for Human Action Recognition

Saima Nazir, Muhammad Haroon Yousaf, Jean-Christophe Nebel, Sergio A. Velastin
2019 Sensors  
Furthermore, we train a multi-class Support Vector Machine (SVM) for classifying bag of expressions into action classes.  ...  In this paper, we propose a Dynamic Spatio-Temporal Bag of Expressions (D-STBoE) model for human action recognition without compromising the strengths of the classical bag of visual words approach.  ...  Action Recognition Since we use a multi-class support vector machine model for action recognition, we need to select both a kernel function and a multi-class model.  ... 
doi:10.3390/s19122790 fatcat:37k7ahzbyrcq5f6e3chtnllrcm
« Previous Showing results 1 — 15 out of 1,562 results