Filters








5,292 Hits in 12.1 sec

Lattice Long Short-Term Memory for Human Action Recognition [article]

Lin Sun, Kui Jia, Kevin Chen, Dit Yan Yeung, Bertram E. Shi, Silvio Savarese
2017 arXiv   pre-print
RNNs, especially Long Short-Term Memory (LSTM), are able to learn temporal motion dynamics.  ...  This assumption is valid for short-term motions but invalid when the duration of the motion is long.  ...  In order to model the dynamics between frames, recurrent neural networks (RNNs), particularly long short-term memory (LSTM), have been considered for video based human action recognition.  ... 
arXiv:1708.03958v1 fatcat:4jovclo32ndmhglmrsd2agmfam

Memory-Augmented Temporal Dynamic Learning for Action Recognition

Yuan Yuan, Dong Wang, Qi Wang
2019 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
Human actions captured in video sequences contain two crucial factors for action recognition, i.e., visual appearance and motion dynamics.  ...  However, CNN based methods are limited in modeling long-term motion dynamics.  ...  action recognition in (Long et al. 2018a) Approach The primary goal of this paper is to enhance the model's capacity for learning long-term and complex motion for action recognition in videos, by  ... 
doi:10.1609/aaai.v33i01.33019167 fatcat:gqbvsqbbqjfrjlia2k7tsscisq

Memory-Augmented Temporal Dynamic Learning for Action Recognition [article]

Yuan Yuan and Dong Wang and Qi Wang
2019 arXiv   pre-print
Human actions captured in video sequences contain two crucial factors for action recognition, i.e., visual appearance and motion dynamics.  ...  However, CNN based methods are limited in modeling long-term motion dynamics.  ...  action recognition in (Long et al. 2018a) Approach The primary goal of this paper is to enhance the model's capacity for learning long-term and complex motion for action recognition in videos, by  ... 
arXiv:1904.13080v1 fatcat:vsb3ufzxqbbqrby7lw4pavemoe

A Review of Deep Learning-based Human Activity Recognition on Benchmark Video Datasets

Vijeta Sharma, Manjari Gupta, Anil Kumar Pandey, Deepti Mishra, Ajai Kumar
2022 Applied Artificial Intelligence  
Most of them are behavior analysis, scene understanding, scene labeling, human activity recognition (HAR), object localization, and event recognition.  ...  Finally, we discuss future research directions and some open challenges on human activity recognition.  ...  Particularly, Long Short-Term Memory (LSTMs) have outperformed on video data for human action recognition.  ... 
doi:10.1080/08839514.2022.2093705 fatcat:6on4g3sp3vaktnyyrk72k4mqta

Aggressive Action Estimation: A Comprehensive Review on Neural Network Based Human Segmentation and Action Recognition

A. F. M. Saifuddin Saif, Md. Akib Shahriar Khan, Abir Mohammad Hadi, Rahul Prashad Karmoker, Joy Julian Gomes
2019 International Journal of Education and Management Engineering  
Human action recognition has been a talked topic since machine vision was coined.  ...  Critical review of papers provided in this work can contribute significantly in addressing human action recognition problem as a whole.  ...  CNN with RNN and LSTM [6, 7, 11] Recurrent Neural Network with Long Short-Term Memory 1.  ... 
doi:10.5815/ijeme.2019.01.02 fatcat:xsb5jgo2zjbctacyj2mtcccfk4

IF-TTN: Information Fused Temporal Transformation Network for Video Action Recognition [article]

Ke Yang, Peng Qiao, Dongsheng Li, Yong Dou
2019 arXiv   pre-print
In the network, Information Fusion Module (IFM) is designed to fuse the appearance and motion features at multiple ConvNet levels for each video snippet, forming a short-term video descriptor.  ...  Focusing on discriminate spatiotemporal feature learning, we propose Information Fused Temporal Transformation Network (IF-TTN) for action recognition on top of popular Temporal Segment Network (TSN) framework  ...  In [30] and [3] , Long Short-Term Memory (LSTM) networks were used to capture the long-range dynamics for action recognition.  ... 
arXiv:1902.09928v2 fatcat:anza23kv2zhwnotqdu5avm4e4e

Temporal Segment Connection Network for Action Recognition

Qian Li, Wenzhu Yang, Xiangyang Chen, Tongtong Yuan, Yuxia Wang
2020 IEEE Access  
, thus significantly improves the accuracy of human action recognition.  ...  On the one hand, the forget gate module of the long short-term memory (LSTM) network is used to establish feature-level connections between each sampling group.  ...  The author thanks the editor and reviewers for their work on this manuscript.  ... 
doi:10.1109/access.2020.3027386 fatcat:g5lrv3dfcjdnnjuyjsapsy6h6q

Pedestrian Behavior Recognition Based on Improved Dual-stream Network with Differential Feature in Surveillance Video

Yonghong Tan, Xuebin Zhou, Aiwu Chen, Songqing Zhou, Yi-Zhang Jiang
2021 Scientific Programming  
decision-making level feature fusion mechanism is used to train the model, which can retain the spatiotemporal characteristics of images between different network frames to a greater extent and reflect the action  ...  In order to improve the pedestrian behavior recognition accuracy of video sequences in complex background, an improved spatial-temporal two-stream network is proposed in this paper.  ...  Long Short-Term Memory (LSTM) .  ... 
doi:10.1155/2021/3279957 fatcat:sh6dxvikfzglfoomsxoeig4dgm

Visual Feature Learning on Video Object and Human Action Detection: A Systematic Review

Dengshan Li, Rujing Wang, Peng Chen, Chengjun Xie, Qiong Zhou, Xiufang Jia
2021 Micromachines  
of utilizing temporal information of adjacent video frames are mainly the optical flow method, Long Short-Term Memory and convolution among adjacent frames.  ...  Human action recognition is the detection of human actions. Usually, video detection is more challenging than image detection, since video frames are often more blurry than images.  ...  Association Long Short-Term Memory (Association LSTM)Long short-term memory (LSTM) [16] is suitable for learning the features with temporal information, because of the connectivity of the structure.  ... 
doi:10.3390/mi13010072 pmid:35056238 pmcid:PMC8781209 fatcat:kdc5msiv2rd7zh7qlxymbpdk3y

Class structure‐aware adversarial loss for cross‐domain human action recognition

Wanjun Chen, Long Liu, Guangfeng Lin, Yajun Chen, Jing Wang
2021 IET Image Processing  
Cross-domain action recognition is a challenging vision task due to the domain shift and the absence of labeled data in the target domain.  ...  In order to accurately model long-term and complex motions, Lattice-LSTM [44] extends LSTM by learning independent hidden state transitions of memory cells for individual spatial locations.  ...  (LSTM). long-term recurrent convolutional network [42] and beyond-short-snippets [43] are the pioneer works that use LSTM for video action recognition in the two-stream network setting.  ... 
doi:10.1049/ipr2.12309 fatcat:kaptzeq54rbalbxi6wcwrskwfa

Vision Transformer and Deep Sequence Learning for Human Activity Recognition in Surveillance Videos

Altaf Hussain, Tanveer Hussain, Waseem Ullah, Sung Wook Baik, Bai Yuan Ding
2022 Computational Intelligence and Neuroscience  
In the proposed framework, the frame-level features are extracted via pretrained Vision Transformer; next, these features are passed to multilayer long short-term memory to capture the long-range dependencies  ...  Human Activity Recognition is an active research area with several Convolutional Neural Network (CNN) based features extraction and classification methods employed for surveillance and other applications  ...  short-term memory (LSTM) Network [20] and Gated Recurrent Unit (GRU) [21] to improve the HAR performance.  ... 
doi:10.1155/2022/3454167 pmid:35419045 pmcid:PMC9001125 fatcat:xljtyzzzgrhnnf3fg4h3ihzgs4

A Comprehensive Study of Deep Video Action Recognition [article]

Yi Zhu, Xinyu Li, Chunhui Liu, Mohammadreza Zolfaghari, Yuanjun Xiong, Chongruo Wu, Zhi Zhang, Joseph Tighe, R. Manmatha, Mu Li
2020 arXiv   pre-print
Video action recognition is one of the representative tasks for video understanding.  ...  In this paper, we provide a comprehensive survey of over 200 existing papers on deep learning for video action recognition.  ...  Acknowledgement We would like to thank Peter Gehler, Linchao Zhu and Thomas Brady for constructive feedback and fruitful discussions.  ... 
arXiv:2012.06567v1 fatcat:plqytbfck5bcndiceshix5unpa

Improving Transcription of Manuscripts with Multimodality and Interaction

Emilio Granell, Carlos David Martinez Hinarejos, Verónica Romero
2018 IberSPEECH 2018  
Besides, this effort reduction is increased when using speech dictations over an Automatic Speech Recognition system, allowing for a faster transcription process.  ...  Therefore, the supervision of those drafts by a human transcriber is still necessary to obtain the correct transcription.  ...  Each recurrent layer is composed of 256 Bidirectional Long-Short Term Memory (BLSTM) units. Finally, a linear fully-connected output layer is used after the recurrent block.  ... 
doi:10.21437/iberspeech.2018-20 dblp:conf/iberspeech/GranellM018 fatcat:yro7zmp2zrherl5hf3qmkfc64q

Situated robot learning for multi-modal instruction and imitation of grasping

J STEIL
2004 Robotics and Autonomous Systems  
It has the long-term goal to demonstrate speech-supported imitation learning of robot actions.  ...  We describe the current state of its realization to enable imitation of human hand postures for flexible grasping and give quantitative results for grasping a broad range of everyday objects.  ...  A holistic, neural object recognition system [18] determines whether a known object has been seen and can be transferred into the short-term memory of the integration module.  ... 
doi:10.1016/s0921-8890(04)00043-0 fatcat:bb2a3okcfjdc7lc2uaurxxuhtq

Situated robot learning for multi-modal instruction and imitation of grasping

J.J. Steil, F. Röthling, R. Haschke, H. Ritter
2004 Robotics and Autonomous Systems  
It has the long-term goal to demonstrate speech-supported imitation learning of robot actions.  ...  We describe the current state of its realization to enable imitation of human hand postures for flexible grasping and give quantitative results for grasping a broad range of everyday objects.  ...  A holistic, neural object recognition system [18] determines whether a known object has been seen and can be transferred into the short-term memory of the integration module.  ... 
doi:10.1016/j.robot.2004.03.007 fatcat:qsmd3mizcnb4ppmcbexbehjjjm
« Previous Showing results 1 — 15 out of 5,292 results