A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
UCF-STAR: A Large Scale Still Image Dataset for Understanding Human Actions
2020
PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE
Action recognition in still images poses a great challenge due to (i) fewer available training data, (ii) absence of temporal information. ...
UCF-STAR is the largest dataset in the literature for action recognition in still images. ...
TSSTN proves that predicting the "latent" temporal information in still images improves action recognition performance. ...
doi:10.1609/aaai.v34i03.5653
fatcat:52c6soxtuze7zeso3ebwxdnxfu
Human Behavior Analysis: A Survey on Action Recognition
2021
Applied Sciences
The visual recognition and understanding of human actions remain an active research domain of computer vision, being the scope of various research works over the last two decades. ...
Previous surveys mainly focus on the evolution of this field, from handcrafted features to deep learning architectures. ...
[13] presented an extensive and complete survey regarding not only action recognition, but also action prediction, presenting the state-of-the-art evolution on both problems. ...
doi:10.3390/app11188324
fatcat:zenvfhlaubht7ar3qrpil4lgdm
Multi-Modality Video Representation for Action Recognition
2020
Journal on Big Data
The difference between image recognition and action recognition is that action recognition needs more modality information to depict one action, such as the appearance, the motion and the dynamic information ...
Most of current methods define an action by spatial information and motion information. ...
Besides, different from image classification, action recognition needs spatial and temporal cues. ...
doi:10.32604/jbd.2020.010431
fatcat:mkzehr6jl5hf5auapqs3daug7q
Action Recognition Using Single-Pixel Time-of-Flight Detection
2019
Entropy
Such data trace to record one action contains a sequence of one-dimensional arrays of voltage values acquired by a single-pixel detector at 1 GHz repetition rate. ...
Our proposed method relies only on recording the temporal evolution of light pulses scattered back from the scene. ...
The spatial properties of the scene are imprinted into the temporal evolution of the recorded trace. ...
doi:10.3390/e21040414
pmid:33267128
fatcat:xhujztfdj5b4revpbyemwuyxkq
DenseImage Network: Video Spatial-Temporal Evolution Encoding and Understanding
[article]
2018
arXiv
pre-print
Extensive experiments on two recent challenging benchmarks demonstrate that our DenseImage Network can accurately capture the common spatial-temporal evolution between similar actions, even with enormous ...
Many of the leading approaches for video understanding are data-hungry and time-consuming, failing to capture the gist of spatial-temporal evolution in an efficient manner. ...
The success of static image classification with CNNs has driven the development of video recognition, but how to represent spatial-temporal evolution in videos using CNNs is still a problem. ...
arXiv:1805.07550v1
fatcat:tp6q3amuhvf5nm3anghg3ex47m
Beyond Temporal Pooling: Recurrence and Temporal Convolutions for Gesture Recognition in Video
[article]
2016
arXiv
pre-print
Recent studies have demonstrated the power of recurrent neural networks for machine translation, image captioning and speech recognition. ...
For the task of capturing temporal structure in video, however, there still remain numerous open research questions. ...
The research leading to these results has received funding from the Agency for Innovation by Science and Technology in Flanders (IWT). ...
arXiv:1506.01911v3
fatcat:fgug5fhirrekplsnerrwgvr6be
Self-Supervised Spatiotemporal Feature Learning via Video Rotation Prediction
[article]
2019
arXiv
pre-print
Our extensive experiments successfully demonstrate the effectiveness of the proposed framework on action recognition, leading to significant improvements over the state-of-the-art self-supervised methods ...
With the self-supervised pre-trained 3DRotNet from large datasets, the recognition accuracy is boosted up by 20.4% on UCF101 and 16.7% on HMDB51 respectively, compared to the models trained from scratch ...
This material is based upon the work supported by National Science Foundation (NSF) under award number IIS-1400802. ...
arXiv:1811.11387v2
fatcat:yhx7laoxv5hnvoqbi4dghkc5ui
TA2N: Two-Stage Action Alignment Network for Few-shot Action Recognition
[article]
2021
arXiv
pre-print
Next, the second stage coordinates query feature to match the spatial-temporal action evolution of support by performing temporally rearrange and spatially offset prediction. ...
The first stage locates the action by learning a temporal affine transform, which warps each video feature to its action duration while dismissing the action-irrelevant feature (e.g. background). ...
However, the spatial variation of actor evolution, such as the positions of actors, also being critical for action recognition, which cannot be modeled by TC. ...
arXiv:2107.04782v2
fatcat:qqa4ymll2zhgvnw5qks7qz4tp4
Convolutional Two-Stream Network Fusion for Video Action Recognition
2016
2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Recent applications of Convolutional Neural Networks (ConvNets) for human action recognition in videos have proposed different solutions for incorporating the appearance and motion information. ...
We study a number of ways of fusing ConvNet towers both spatially and temporally in order to best take advantage of this spatio-temporal information. ...
This work was supported by the Austrian Science Fund (FWF) under project P27076, and also by EPSRC Programme Grant Seebibyte EP/M013774/1. The GPUs used for this research were donated by NVIDIA. ...
doi:10.1109/cvpr.2016.213
dblp:conf/cvpr/FeichtenhoferPZ16
fatcat:fv3nx5sparhb5c6vcpthy5gply
Convolutional Two-Stream Network Fusion for Video Action Recognition
[article]
2016
arXiv
pre-print
Recent applications of Convolutional Neural Networks (ConvNets) for human action recognition in videos have proposed different solutions for incorporating the appearance and motion information. ...
We study a number of ways of fusing ConvNet towers both spatially and temporally in order to best take advantage of this spatio-temporal information. ...
This work was supported by the Austrian Science Fund (FWF) under project P27076, and also by EPSRC Programme Grant Seebibyte EP/M013774/1. The GPUs used for this research were donated by NVIDIA. ...
arXiv:1604.06573v2
fatcat:deygtaqfwnhhxg37xzk6v7bjai
Survey On Feature Extraction Approach for Human Action Recognition in Still Images and Videos
2022
International Journal of Scientific Research in Computer Science Engineering and Information Technology
The paper presents a brief overview of features of human actions by categorizing as still image-based and video-based. ...
The main aim of this work is to study the various action recognition techniques in videos and images. ...
Still image-based action recognition research is still in its early stages, as it is a relatively new field. ...
doi:10.32628/cseit228392
fatcat:ixbasagqvvaahfc3pmilvd7rke
Relevance of Interest Points for Eye Position Prediction on Videos
[chapter]
2009
Lecture Notes in Computer Science
We fund that in function of the video sequence, and more especially in function of the motion inside the sequence, the spatial or the space-time interest point detector is more or less relevant to predict ...
This papers tests the relevance of interest points to predict eye movements of subjects when viewing video sequences freely. ...
He used this detector for the recognition of human actions (walking, running, drinking, etc.) in movies. ...
doi:10.1007/978-3-642-04667-4_33
fatcat:24ybhgg2czdepbnk7eak7u43ba
Complex sequential understanding through the awareness of spatial and temporal concepts
2020
Nature Machine Intelligence
Experiments demonstrate that a Semi-Coupled Structure can successfully annotate the outline of an object in images sequentially and perform video action recognition. ...
For sequence-to-sequence problems, a Semi-Coupled Structure can predict future meteorological radar echo images based on observed images. ...
A model, given a sequence of the radar echo images sorted in time, needs to predict a sequence of the future radar echo images from the previous evolution of CR (see Fig. 5
b). ...
doi:10.1038/s42256-020-0168-3
fatcat:dyrxp3pqvvb3tmrjv2yig5egru
Im2Flow: Motion Hallucination from Static Images for Action Recognition
2018
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
We propose an approach that hallucinates the unobserved future motion implied by a single snapshot to help static-image action recognition. ...
It not only achieves state-of-the-art accuracy for dense optical flow prediction, but also consistently enhances recognition of actions and dynamic scenes. ...
Acknowledgements: This research was supported in part by an ONR PECASE Award N00014-15-1-2291 and an IBM Faculty Award and IBM Open Collaboration Award. ...
doi:10.1109/cvpr.2018.00622
dblp:conf/cvpr/GaoXG18
fatcat:ofat4dobw5aj7apx4ikmjymyqe
Im2Flow: Motion Hallucination from Static Images for Action Recognition
[article]
2018
arXiv
pre-print
We propose an approach that hallucinates the unobserved future motion implied by a single snapshot to help static-image action recognition. ...
It not only achieves state-of-the-art accuracy for dense optical flow prediction, but also consistently enhances recognition of actions and dynamic scenes. ...
Acknowledgements: This research was supported in part by an ONR PECASE Award N00014-15-1-2291 and an IBM Faculty Award and IBM Open Collaboration Award. ...
arXiv:1712.04109v3
fatcat:cnucptnuizg3rmx6k4mumzchem
« Previous
Showing results 1 — 15 out of 6,735 results