A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
TAN: Temporal Aggregation Network for Dense Multi-label Action Recognition
[article]
2018
arXiv
pre-print
Experiments show that our model is well suited for dense multi-label action recognition, which is a challenging sub-topic of action recognition that requires predicting multiple action labels in each frame ...
We present Temporal Aggregation Network (TAN) which decomposes 3D convolutions into spatial and temporal aggregation blocks. ...
Our approach is suitable for dense multi-label action recognition because we learn dense spatio-temporal information efficiently without the need to reduce temporal resolution. ...
arXiv:1812.06203v1
fatcat:7allnptupzalbnxdoshweo5rpu
A Real-time Action Representation with Temporal Encoding and Deep Compression
[article]
2020
arXiv
pre-print
To address this challenge, we propose a new real-time convolutional architecture, called Temporal Convolutional 3D Network (T-C3D), for action representation. ...
Deep neural networks have achieved remarkable success for video-based action recognition. However, most of existing approaches cannot be deployed in practice due to the high computational cost. ...
This work is partially supported by the Funds for International Cooperation and Exchange of the National Natural Science Foundation of China (No. 61720106007), and the Funds for Creative Research Groups ...
arXiv:2006.09675v1
fatcat:pkk2n7ud4jcqniqj5idolxqn5y
Object Affordances Graph Network for Action Recognition
2019
British Machine Vision Conference
With the spatio-temporal co-occurrences between human and objects captured, the Object Affordances Graph Network (OAGN) is subsequently proposed. ...
To provide a fair evaluation of the role that object affordances could play on human action recognition, we have assembled a new dataset with additional annotated object bounding-boxes to account for human-object ...
These two features are subsequently concatenated as the final video representation for action category prediction. We use sigmoid-based classifier for multi-label prediction. ...
dblp:conf/bmvc/TanWZGZH19
fatcat:qkvynakctbhptlryqcxtaokudu
Joint Learning On The Hierarchy Representation for Fine-Grained Human Action Recognition
[article]
2021
arXiv
pre-print
Inspired by the recently proposed hierarchy representation of fine-grained actions in FineGym and SlowFast network for action recognition, we propose a novel multi-task network which exploits the FineGym ...
The multi-task network consists of three pathways of SlowOnly networks with gradually increased frame rates for events, sets and elements of fine-grained actions, followed by our proposed integration layers ...
TRN [9] introduces a temporal relational reasoning module that allows aggregation of multi-scale temporal relations between frames. ...
arXiv:2110.05853v1
fatcat:z3zg3eam3jdeli2tkqhh7bdrhy
Dynamic Inference: A New Approach Toward Efficient Video Action Recognition
[article]
2020
arXiv
pre-print
Though action recognition in videos has achieved great success recently, it remains a challenging task due to the massive computational cost. ...
Designing lightweight networks is a possible solution, but it may degrade the recognition performance. ...
In [13] , the authors propose to adaptively determine the network depth for different images and a multi-scale dense network is designed for image classification. ...
arXiv:2002.03342v1
fatcat:7fu3u6hmirg4dp7jam2yfaznmm
MoViNets: Mobile Video Networks for Efficient Video Recognition
[article]
2021
arXiv
pre-print
We present Mobile Video Networks (MoViNets), a family of computation and memory efficient video networks that can operate on streaming video for online inference. 3D convolutional neural networks (CNNs ...
These three progressive techniques allow MoViNets to achieve state-of-the-art accuracy and efficiency on the Kinetics, Moments in Time, and Charades video action recognition datasets. ...
for human action recognition. ...
arXiv:2103.11511v2
fatcat:gahsdk5ep5edtn5ubymvlwli4u
Relaxed Transformer Decoders for Direct Action Proposal Generation
[article]
2021
arXiv
pre-print
This paper presents a simple and efficient framework (RTD-Net) for direct action proposal generation, by re-purposing a Transformer-alike architecture. ...
Temporal action proposal generation is an important and challenging task in video understanding, which aims at detecting all temporal segments containing action instances of interest. ...
In addition to provide semantic labels for trimmed videos, action recognition is also eligible for extracting snippet-level features in untrimmed videos, which are used in downstream tasks, such as temporal ...
arXiv:2102.01894v3
fatcat:lxzztsicyfgmtnd5a7rksukxsu
Deep 3D human pose estimation: A review
2021
Computer Vision and Image Understanding
Our key idea is to use a multi-way matching algorithm to cluster the detected 2D poses in all views. ...
However, there is still significant room for improvement. ...
Specifically, to deal with occlusion, Cheng et al. (2020) apply data augmentation and multi-scale spatial features for 2D keypoints prediction in each frame, and multi-stride temporal convolutional networks ...
doi:10.1016/j.cviu.2021.103225
fatcat:hvlgjuxd2zfgji6k4y4g65cs7y
Coherent Loss: A Generic Framework for Stable Video Segmentation
[article]
2020
arXiv
pre-print
Video segmentation approaches are of great importance for numerous vision tasks especially in video manipulation for entertainment. ...
In particular, we propose a Coherent Loss with a generic framework to enhance the performance of a neural network against jittering artifacts, which combines with high accuracy and high consistency. ...
(Huang et al. 2020 ) propose a new temporal aggregation network and a novel dynamic time-evolving template matching mechanism to achieve significantly improved performance. David et al. ...
arXiv:2010.13085v1
fatcat:hgra5ikjzfbchaj62h5n7mttam
Actor-Action Video Classification CSC 249/449 Spring 2020 Challenge Report
[article]
2020
arXiv
pre-print
This technical report summarizes submissions and compiles from Actor-Action video classification challenge held as a final project in CSC 249/449 Machine Vision course (Spring 2020) at University of Rochester ...
Two-stream convo-
lutional networks for action recognition in videos, 2014. ...
performing multi-label actor-action classification. ...
arXiv:2008.00141v2
fatcat:duben6xzr5ajhciyoczsyo23du
2020 Index IEEE Transactions on Image Processing Vol. 29
2020
IEEE Transactions on Image Processing
., +, TIP 2020 5396-5407 Multi-Scale Multi-View Deep Feature Aggregation for Food Recognition. ...
., +, TIP 2020 1016-1029 Deep Image-to-Video Adaptation and Fusion Networks for Action Recognition. Liu, Y., +, TIP 2020 3168-3182 Deep Ranking for Image Zero-Shot Multi-Label Classification. ...
doi:10.1109/tip.2020.3046056
fatcat:24m6k2elprf2nfmucbjzhvzk3m
SegTAD: Precise Temporal Action Detection via Semantic Segmentation
[article]
2022
arXiv
pre-print
To address these issues and precisely model temporal action detection, we formulate the task of temporal action detection in a novel perspective of semantic segmentation. ...
Temporal action detection (TAD) is an important yet challenging task in video analysis. ...
To this end, various tasks have emerged, for example, action recognition [17] , spatial-temporal action detection [53] , temporal action localization [24, 37] . ...
arXiv:2203.01542v1
fatcat:oyn4rg4wg5eivimd3tnc4wjswa
Weakly-Supervised Spatio-Temporal Anomaly Detection in Surveillance Video
[article]
2021
arXiv
pre-print
To address this challenging task, we propose a dual-branch network which takes as input the proposals with multi-granularities in both spatial-temporal domains. ...
., ST-UCF-Crime and STRA, consisting of videos containing spatio-temporal abnormal annotations to serve as the benchmarks for WSSTAD. ...
Our ranking loss in MIL builds upon the existing observations that the learned concepts from the tube and temporal branch are often complementary to each other in action recognition and localization task ...
arXiv:2108.03825v1
fatcat:q4wkot3ywrc3pogbwuii5bqqzm
2020 Index IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 42
2021
IEEE Transactions on Pattern Analysis and Machine Intelligence
., +, TPAMI Jan.
2020 203-220
2020 1981-1995
RefineNet: Multi-Path Refinement Networks for Dense Prediction. ...
., +, TPAMI May 2020 1218-1227 RefineNet: Multi-Path Refinement Networks for Dense Prediction. Lin, G., +, TPAMI May 2020 1228-1242 Robust RGB-D Face Recognition Using Attribute-Aware Loss. ...
Object recognition Adversarial Action Prediction Networks. Kong, Y., +, TPAMI March 2020 539-553 Capturing the Geometry of Object Categories from Video Supervision. ...
doi:10.1109/tpami.2020.3036557
fatcat:3j6s2l53x5eqxnlsptsgbjeebe
Dynamic Spatio-Temporal Bag of Expressions (D-STBoE) Model for Human Action Recognition
2019
Sensors
Furthermore, we train a multi-class Support Vector Machine (SVM) for classifying bag of expressions into action classes. ...
In this paper, we propose a Dynamic Spatio-Temporal Bag of Expressions (D-STBoE) model for human action recognition without compromising the strengths of the classical bag of visual words approach. ...
Action Recognition Since we use a multi-class support vector machine model for action recognition, we need to select both a kernel function and a multi-class model. ...
doi:10.3390/s19122790
fatcat:37k7ahzbyrcq5f6e3chtnllrcm
« Previous
Showing results 1 — 15 out of 1,562 results