Filters








11,085 Hits in 6.5 sec

Learning Temporal Pose Estimation from Sparsely-Labeled Videos [article]

Gedas Bertasius, Christoph Feichtenhofer, Du Tran, Jianbo Shi, Lorenzo Torresani
2019 arXiv   pre-print
To reduce the need for dense annotations, we propose a PoseWarper network that leverages training videos with sparse annotations (every k frames) to learn to perform dense temporal pose propagation and  ...  Modern approaches for multi-person pose estimation in video require large amounts of dense annotations. However, labeling every frame in a video is costly and labor intensive.  ...  In contrast to these prior methods, our primary objective is to learn an effective video pose detector from sparsely labeled videos.  ... 
arXiv:1906.04016v3 fatcat:auz5hm5ykrh6tcjngz26qy4epy

OTPose: Occlusion-Aware Transformer for Pose Estimation in Sparsely-Labeled Videos [article]

Kyung-Min Jin, Gun-Hee Lee, Seong-Whan Lee
2022 arXiv   pre-print
We achieve state-of-the-art pose estimation results for PoseTrack2017 and PoseTrack2018 datasets and demonstrate the robustness of our approach to occlusion and motion blur in sparsely annotated video  ...  frame's final pose estimation.  ...  PoseWarper [8] learns the warping mechanism through label propagation in sparsely labeled videos [16] . In addition, DCPose [9] proposes refining a pose using bidirectional frames.  ... 
arXiv:2207.09725v2 fatcat:agwyb2h2pbdh7pbr2tqsyo4juq

DeciWatch: A Simple Baseline for 10x Efficient 2D and 3D Pose Estimation [article]

Ailing Zeng, Xuan Ju, Lei Yang, Ruiyuan Gao, Xizhou Zhu, Bo Dai, Qiang Xu
2022 arXiv   pre-print
Unlike current solutions that estimate each frame in a video, DeciWatch introduces a simple yet effective sample-denoise-recover framework that only watches sparsely sampled frames, taking advantage of  ...  Specifically, DeciWatch uniformly samples less than 10% video frames for detailed estimation, denoises the estimated 2D/3D poses with an efficient Transformer architecture, and then accurately recovers  ...  Specifically, DenoiseNet refines sparse poses from pose estimator.  ... 
arXiv:2203.08713v2 fatcat:hqvoirydxzccrg2e57xk2sl2lm

Direct Dense Pose Estimation [article]

Liqian Ma, Lingjie Liu, Christian Theobalt, Luc Van Gool
2022 arXiv   pre-print
We also propose a simple yet effective 2D temporal-smoothing scheme to alleviate the temporal jitters when dealing with video data.  ...  Dense human pose estimation is the problem of learning dense correspondences between RGB images and the surfaces of human bodies, which finds various applications, such as human body reconstruction, human  ...  The main idea is to use the temporal constraint from the original RGB video.  ... 
arXiv:2204.01263v1 fatcat:q5khfaprtzhvbgh4ogvvaavdii

SemiMultiPose: A Semi-supervised Multi-animal Pose Estimation Framework [article]

Ari Blau, Christoph Gebhardt, Andres Bendesky, Liam Paninski, Anqi Wu
2022 arXiv   pre-print
The resulting algorithm will provide superior multi-animal pose estimation results on three animal experiments compared to the state-of-the-art baseline and exhibits more predictive power in sparsely-labeled  ...  training, which is critical for sparsely-labeled problems.  ...  Conclusion We propose a novel semi-supervised learning framework for multi-animal pose estimation.  ... 
arXiv:2204.07072v1 fatcat:c5npnbrqb5ggzfgqs3fmnqjswu

Towards Generalizable Surgical Activity Recognition Using Spatial Temporal Graph Convolutional Networks [article]

Duygu Sarikaya, Pierre Jannin
2020 arXiv   pre-print
We construct spatial graphs connecting the joint pose estimations of surgical tools.  ...  The proposed modality is based on spatial temporal graph representations of surgical tools in videos, for surgical activity recognition.  ...  Please note that, we intentionally used frames from the same videos in this step, for efficient labeling purposes with minimal effort (as pose estimation is not the focus of this work).  ... 
arXiv:2001.03728v4 fatcat:f2azlpri3jhcbhth576nluzzxy

Author Index

2010 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition  
, Yizhen Learning from Interpolated Images using Neural Networks for Digital Forensics Thormählen, Thorsten Multilinear Pose and Body Shape Estimation of Dressed Subjects from Image Sets Exploiting Global  ...  Applications Hasler, Nils Multilinear Pose and Body Shape Estimation of Dressed Subjects from Image Sets Hayden, David S.  ... 
doi:10.1109/cvpr.2010.5539913 fatcat:y6m5knstrzfyfin6jzusc42p54

Online Learning for Fast Segmentation of Moving Objects [chapter]

Liam Ellis, Vasileios Zografos
2013 Lecture Notes in Computer Science  
We pose this as a discriminative online semi-supervised appearance learning task, where supervising labels are autonomously generated by a motion segmentation algorithm.  ...  In addition, we further exploit the sparse trajectories from the motion segmentation to obtain a simple model that encodes the spatial properties and location of objects at each frame.  ...  For each object, the 2D spatial distribution is estimated from the sparse point set associated with that object label. The set Ω = {1 : y sparse i = ω} is the set of indices for which the label is ω.  ... 
doi:10.1007/978-3-642-37444-9_5 fatcat:aycqmtnpnnhtlfhwnd3wzsgvea

Recognizing Human Actions as the Evolution of Pose Estimation Maps

Mengyuan Liu, Junsong Yuan
2018 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition  
Instead of relying on the inaccurate human poses estimated from videos, we observe that pose estimation maps, the byproduct of pose estimation, preserve richer cues of human body to benefit action recognition  ...  Most video-based action recognition approaches choose to extract features from the whole video to recognize actions.  ...  Since original estimated pose joints are too sparse to represent the human body, we sort the order of joint labels according to the body structure, and use linear interpolation to sample abundant points  ... 
doi:10.1109/cvpr.2018.00127 dblp:conf/cvpr/LiuY18 fatcat:i3rmf6vm4jh3zhhouxvzwi5a5i

Multi-Instance Multi-Label Action Recognition and Localization Based on Spatio-Temporal Pre-Trimming for Untrimmed Videos

Xiao-Yu Zhang, Haichao Shi, Changsheng Li, Peng Li
2020 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
Motivated by the fact that person is the key factor in a human action, we spatially and temporally segment each untrimmed video into person-centric clips with pose estimation and tracking techniques.  ...  Given the bag-of-instances structure associated with video-level labels, action recognition is naturally formulated as a multi-instance multi-label learning problem.  ...  Recurrent human pose estimation. In FG, 468-475.  ... 
doi:10.1609/aaai.v34i07.6986 fatcat:bxywbawmkfetxjy2ji5ln6k55i

ArtTrack: Articulated Multi-person Tracking in the Wild [article]

Eldar Insafutdinov, Mykhaylo Andriluka, Leonid Pishchulin, Siyu Tang, Evgeny Levinkov, Bjoern Andres, Bernt Schiele
2017 arXiv   pre-print
We report results on a public MPII Human Pose benchmark and on a new MPII Video Pose dataset of image sequences with multiple people.  ...  Our starting point is a model that resembles existing architectures for single-frame pose estimation but is substantially faster.  ...  The authors thank Varvara Obolonchykova and Bahar Tarakameh for their help in creating the video dataset.  ... 
arXiv:1612.01465v3 fatcat:ma6xy6jzxnc7hgqo6g4y3gkpvm

U4D: Unsupervised 4D Dynamic Scene Understanding

Armin Mustafa, Chris Russell, Adrian Hilton
2019 2019 IEEE/CVF International Conference on Computer Vision (ICCV)  
We further leverage recent advances in 3D pose estimation to constrain the joint semantic instance segmentation and 4D temporally coherent reconstruction.  ...  We introduce the first approach to solve the challenging problem of unsupervised 4D visual scene understanding for complex dynamic scenes with multiple interacting people from multi-view video.  ...  The corresponding joint locations from the 3D pose are backprojected in each view and added to sparse temporal tracks in between key-frames.  ... 
doi:10.1109/iccv.2019.01052 dblp:conf/iccv/Mustafa0H19 fatcat:gljhpdkiwrfrzpkg5cotp7nqva

DeLS-3D: Deep Localization and Segmentation with a 3D Semantic Map [article]

Peng Wang, Ruigang Yang, Binbin Cao, Wei Xu, Yuanqing Lin
2018 arXiv   pre-print
Each video frame has ground truth pose from highly accurate motion sensors.  ...  Specifically, we first have an initial coarse camera pose obtained from consumer-grade GPS/IMU, based on which a label map can be rendered from the 3D semantic map.  ...  To incorporate the temporal correlations, the corrected poses from pose CNN are fed into a pose RNN to further improves the estimation accuracy in the stream.  ... 
arXiv:1805.04949v1 fatcat:eqzafvk6bfahpdplgepe3m4mre

DeLS-3D: Deep Localization and Segmentation with a 3D Semantic Map

Peng Wang, Ruigang Yang, Binbin Cao, Wei Xu, Yuanqing Lin
2018 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition  
Each video frame has ground truth pose from highly accurate motion sensors.  ...  Specifically, we first have an initial coarse camera pose obtained from consumer-grade GPS/IMU, based on which a label map can be rendered from the 3D semantic map.  ...  To incorporate the temporal correlations, the corrected poses from pose CNN are fed into a pose RNN to further improves the estimation accuracy in the stream.  ... 
doi:10.1109/cvpr.2018.00614 dblp:conf/cvpr/WangYCXL18 fatcat:rkml3hlwlbdonez2t2xjyvl4uq

Improve Accurate Pose Alignment and Action Localization by Dense Pose Estimation

Yuxiang Zhou, Jiankang Deng, Stefanos Zafeiriou
2018 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018)  
We applied out network to all frames of videos alongside with output from SSN to further improve detection accuracy, especially for pose related and sparsely annotated videos.  ...  In this work we explore the use of shape-based representations as an auxiliary source of supervision for pose estimation.  ...  We apply the additional signals to all frames of videos to generate dense human pose features and combined with results from SSN to further improve accuracy, especially for pose related and sparsely annotated  ... 
doi:10.1109/fg.2018.00077 dblp:conf/fgr/ZhouDZ18 fatcat:gqb5ug3lo5ephfwnaccmgqkmv4
« Previous Showing results 1 — 15 out of 11,085 results