Filters








84,628 Hits in 6.1 sec

Temporal Relational Reasoning in Videos [article]

Bolei Zhou, Alex Andonian, Aude Oliva, Antonio Torralba
2018 arXiv   pre-print
Further analyses show that the models learn intuitive and interpretable visual common sense knowledge in videos.  ...  We evaluate TRN-equipped networks on activity recognition tasks using three recent video datasets - Something-Something, Jester, and Charades - which fundamentally depend on temporal relational reasoning  ...  Related Work Convolutional Neural Networks for Activity Recognition. Activity recognition in videos is a core problem in computer vision.  ... 
arXiv:1711.08496v2 fatcat:hqalpk6x2vfvlibc45mzcnzeza

Social Relation Recognition From Videos via Multi-Scale Spatial-Temporal Reasoning

Xinchen Liu, Wu Liu, Meng Zhang, Jingwen Chen, Lianli Gao, Chenggang Yan, Tao Mei
2019 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)  
On the one hand, the actions and storylines in videos provide more important cues for social relation recognition.  ...  By this means, MSTR can comprehensively explore the multi-scale actions and storylines in spatial-temporal dimensions for social relation reasoning in videos.  ...  Traditional Convolutional Neural Networks usually apply 2-D or 3-D filters on images or videos to abstract visual features from low-level space to high-level space [10] .  ... 
doi:10.1109/cvpr.2019.00368 dblp:conf/cvpr/LiuLZCGYM19 fatcat:6oo75sn5ebezdnetpfgrmrxreq

Temporal Relational Reasoning in Videos [chapter]

Bolei Zhou, Alex Andonian, Aude Oliva, Antonio Torralba
2018 Lecture Notes in Computer Science  
Further analyses show that the models learn intuitive and interpretable visual common sense knowledge in videos 1 .  ...  We evaluate TRN-equipped networks on activity recognition tasks using three recent video datasets -Something-Something, Jester, and Charades -which fundamentally depend on temporal relational reasoning  ...  Related Work Convolutional Neural Networks for Activity Recognition. Activity recognition in videos is a core problem in computer vision.  ... 
doi:10.1007/978-3-030-01246-5_49 fatcat:ndrjqj3nxbc7jcbmjxsswgcj4e

Learning Actor Relation Graphs for Group Activity Recognition

Jianchao Wu, Limin Wang, Li Wang, Jie Guo, Gangshan Wu
2019 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)  
We also visualize the learned actor graphs and relation features, which demonstrate that the proposed ARG is able to capture the discriminative relation information for group activity recognition.  ...  Thanks to the Graph Convolutional Network, the connections in ARG could be automatically learned from group activity videos in an end-toend manner, and the inference on ARG could be efficiently performed  ...  Actor Relation Graph Left Spike Recent deep learning methods have shown promising results for group activity recognition in videos [3, 24, 45, 12, 32, 59, 23, 39] .  ... 
doi:10.1109/cvpr.2019.01020 dblp:conf/cvpr/WuWWGW19 fatcat:v7mkrsbjd5e7vl3zh7p5mw4n5a

Learning Actor Relation Graphs for Group Activity Recognition [article]

Jianchao Wu, Limin Wang, Li Wang, Jie Guo, Gangshan Wu
2019 arXiv   pre-print
We also visualize the learned actor graphs and relation features, which demonstrate that the proposed ARG is able to capture the discriminative relation information for group activity recognition.  ...  Thanks to the Graph Convolutional Network, the connections in ARG could be automatically learned from group activity videos in an end-to-end manner, and the inference on ARG could be efficiently performed  ...  Model Visualization Actor relation graph visualization We visualize several examples of the relation graph generated by our model in Figure 3 .  ... 
arXiv:1904.10117v1 fatcat:mzl6ovx3hzfjpf2h2wqjbk5un4

Skeleton based Zero Shot Action Recognition in Joint Pose-Language Semantic Space [article]

Bhavan Jasani, Afshaan Mazagonwalla
2019 arXiv   pre-print
In this work, we present a body pose based zero shot action recognition network and demonstrate its performance on the NTU RGB-D dataset.  ...  Our model learns to jointly encapsulate visual similarities based on pose features of the action performer as well as similarities in the natural language descriptions of the unseen action class names.  ...  Experiments Dataset NTU RGB+D [12] is a large scale dataset for 3-D human activity analysis in an indoor environment.  ... 
arXiv:1911.11344v1 fatcat:5adeuam35vd5fclts2rzis3wby

Exploring Relations in Untrimmed Videos for Self-Supervised Learning [article]

Dezhao Luo, Bo Fang, Yu Zhou, Yucan Zhou, Dayan Wu, Weiping Wang
2020 arXiv   pre-print
Finally, the network learns representations by predicting the category of relations between the video clips.  ...  In this paper, we propose a novel self-supervised method, referred to as Exploring Relations in Untrimmed Videos (ERUV), which can be straightforwardly applied to untrimmed videos (real unlabeled) to learn  ...  Video Action Recognition Great progress has been achieved in video action recognition with deep neural networks.  ... 
arXiv:2008.02711v1 fatcat:zexhmmtazffyjepddxasb2dfoa

Qiniu Submission to ActivityNet Challenge 2018 [article]

Xiaoteng Zhang, Yixin Bao, Feiyun Zhang, Kai Hu, Yicheng Wang, Liang Zhu, Qinzhu He, Yining Lin, Jie Shao, Yao Peng
2018 arXiv   pre-print
In this paper, we introduce our submissions for the tasks of trimmed activity recognition (Kinetics) and trimmed event recognition (Moments in Time) for Activitynet Challenge 2018.  ...  In the two tasks, non-local neural networks and temporal segment networks are implemented as our base models.  ...  Figure 3 . 3 Our network structure for learning relations Table 1 1 and 2 show our TSN results on Kinetics and Moments in Times dataset.  ... 
arXiv:1806.04391v1 fatcat:a4tlvtf7evfc3o5ns5rpdacgsq

Emotion Recognition System from Speech and Visual Information based on Convolutional Neural Networks

Nicolae-Catalin Ristea, Liviu Cristian Dutu, Anamaria Radoi
2019 2019 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)  
Experimental results show the effectiveness of the proposed scheme for emotion recognition and the importance of combining visual with audio data.  ...  In this paper, we propose a system that is able to recognize emotions with a high accuracy rate and in real time, based on deep Convolutional Neural Networks.  ...  The spatio-temporal relations in the video streams are tackled by using a relational autoencoder.  ... 
doi:10.1109/sped.2019.8906538 dblp:conf/sped/RisteaDR19 fatcat:ivjhman7ybfntpv2akmvfaqsua

Domain Adaptive Robotic Gesture Recognition with Unsupervised Kinematic-Visual Data Alignment [article]

Xueying Shi, Yueming Jin, Qi Dou, Jing Qin, Pheng-Ann Heng
2021 arXiv   pre-print
We extensively evaluate our method for gesture recognition using DESK dataset with peg transfer procedure.  ...  Automated surgical gesture recognition is of great importance in robot-assisted minimally invasive surgery.  ...  Unsupervised domain adaptation for action recognition in natural videos is an emerging research topic.  ... 
arXiv:2103.04075v2 fatcat:w2agc7ygjzhk5lyieyfxdvqip4

Spatio-Temporal Action Graph Networks [article]

Roei Herzig, Elad Levi, Huijuan Xu, Hang Gao, Eli Brosh, Xiaolong Wang, Amir Globerson, Trevor Darrell
2019 arXiv   pre-print
We propose a novel inter-object graph representation for activity recognition based on a disentangled graph embedding with direct observation of edge appearance.  ...  Activity recognition models that represent object interactions explicitly have the potential to learn in a more efficient manner than those that represent scenes with global descriptors.  ...  Acknowledgements This work was completed in partial fulfillment for the Ph.D degree of the first author.  ... 
arXiv:1812.01233v2 fatcat:hcx6yyganrcedl5gyvwss3xb5u

Group activity recognition by using effective multiple modality relation representation with temporal-spatial attention

Dezhong Xu, Heng Fu, Lifang Wu, Meng Jian, Dong Wang, Xu Liu
2020 IEEE Access  
In this work, a technology of novel group activity recognition is proposed based on multi-modal relation representation with temporal-spatial attention.  ...  Group activity recognition has received a great deal of interest because of its broader applications in sports analysis, autonomous vehicles, CCTV surveillance systems and video summarization systems.  ...  TABLE 3 . 3 Ablation studies for various module recognition accuracies on the Volleyball dataset.  ... 
doi:10.1109/access.2020.2979742 fatcat:jmy5xrtc5jexnb2d46uxrxinta

Relational Graph Learning on Visual and Kinematics Embeddings for Accurate Gesture Recognition in Robotic Surgery [article]

Yonghao Long, Jie Ying Wu, Bo Lu, Yueming Jin, Mathias Unberath, Yun-Hui Liu, Pheng Ann Heng, Qi Dou
2021 arXiv   pre-print
In this regard, we propose a novel online approach of multi-modal relational graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information through interactive message propagation  ...  In specific, we first extract embeddings from video and kinematics sequences with temporal convolutional networks and LSTM units.  ...  Fig. 1 . 1 The overview of our proposed multi-modal relational graph network for surgical gesture recognition in robot-assisted surgery.  ... 
arXiv:2011.01619v2 fatcat:5io4a2qwtrfhpk2skxgoqu4m6i

Heterogeneous Non-Local Fusion for Multimodal Activity Recognition

Petr Byvshev, Pascal Mettes, Yu Xiao
2020 Proceedings of the 2020 International Conference on Multimedia Retrieval  
In this work, we investigate activity recognition using multimodal inputs from heterogeneous sensors. Activity recognition is commonly tackled from a single-modal perspective using videos.  ...  In the network, heterogeneous inputs are fused, while maintaining the shapes and dimensionalities that fit each input.  ...  Activity recognition from videos has gained a lot of traction in recent years, due to advances in deep networks designed for videos [7, 17, 18, 58, 59, 62] .  ... 
doi:10.1145/3372278.3390675 dblp:conf/mir/ByvshevMX20 fatcat:5ot2q5pumrdydnnnwc6qqh27nu

Expression Snippet Transformer for Robust Video-based Facial Expression Recognition [article]

Yuanyuan Liu, Wenbin Wang, Chuanxu Feng, Haoyu Zhang, Zhe Chen, Yibing Zhan
2021 arXiv   pre-print
The recent success of Transformer has provided a new direction to various visual understanding tasks, including video-based facial expression recognition (FER).  ...  By modeling visual relations effectively, Transformer has shown its power for describing complicated patterns.  ...  Video-based emotion recognition using deeply-supervised neural networks.  ... 
arXiv:2109.08409v1 fatcat:cgjqzktsyrczjdvoeeetpmb2vm
« Previous Showing results 1 — 15 out of 84,628 results