1,158 Hits in 10.7 sec

Towards 3D Human Pose Estimation in the Wild: a Weakly-supervised Approach [article]

Xingyi Zhou, Qixing Huang, Xiao Sun, Xiangyang Xue, Yichen Wei
2017 arXiv   pre-print
In this paper, we study the task of 3D human pose estimation in the wild.  ...  We propose a weakly-supervised transfer learning method that uses mixed 2D and 3D labels in a unified deep neutral network that presents two-stage cascaded structure.  ...  This work is supported in part by the National Natural Science Foundation of China (#U1611461, #61572138), Shanghai Municipal Science and Technology Commission (#16JC1420401).  ... 
arXiv:1704.02447v2 fatcat:dzsg56u7lzcsjo47hp6pd57jsi

Lifting 2d Human Pose to 3d : A Weakly Supervised Approach [article]

Sandika Biswas, Sanjana Sinha, Kavya Gupta, Brojeshwar Bhowmick
2019 arXiv   pre-print
Few approaches have utilized training images from both 3d and 2d pose datasets in a weakly-supervised manner for learning 3d poses in unconstrained settings.  ...  In this paper, we propose a method which can effectively predict 3d human pose from 2d pose using a deep neural network trained in a weakly-supervised manner on a combination of ground-truth 3d pose and  ...  We propose a weakly supervised approach for 3d pose estimation from given 2d pose.  ... 
arXiv:1905.01047v1 fatcat:pybcweomxva75ms2qeksqxjuaq

MEBOW: Monocular Estimation of Body Orientation in the Wild

Chenyan Wu, Yukun Chen, Jiajia Luo, Che-Chun Su, Anuja Dawane, Bikramjot Hanzra, Zhuo Deng, Bilan Liu, James Z. Wang, Cheng-hao Kuo
2020 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)  
We present COCO-MEBOW (Monocular Estimation of Body Orientation in the Wild), a new large-scale dataset for orientation estimation from a single in-the-wild image.  ...  Additionally, we present a novel triple-source solution for 3-D human pose estimation, where 3-D pose labels, 2-D pose labels, and our body-orientation labels are all used in joint training.  ...  Acknowledgments A portion of the computation used the Extreme Science and Engineering Discovery Environment (XSEDE), which is an infrastructure supported by National Science Foundation (NSF) grant number  ... 
doi:10.1109/cvpr42600.2020.00351 dblp:conf/cvpr/WuCLSDHDLWK20 fatcat:bprqjfrcmnbi3avafb4r3u7a2y

Exploiting temporal context for 3D human pose estimation in the wild [article]

Anurag Arnab, Carl Doersch, Andrew Zisserman
2019 arXiv   pre-print
We show that retraining a single-frame 3D pose estimator on this data improves accuracy on both real-world and mocap data by evaluating on the 3DPW and HumanEVA datasets.  ...  We present a bundle-adjustment-based algorithm for recovering accurate 3D human pose and meshes from monocular videos.  ...  Thereafter, in Sec. 5.3 we run our method large-scale on Kinetics videos before using these predictions in Sec. 5.4 as weakly-supervised ground truth to retrain a per-frame 3D pose estimation model as  ... 
arXiv:1905.04266v1 fatcat:fvzjmyfkhbdw5gifazrzdshf74

Domain Adaptive 3D Pose Augmentation for In-the-wild Human Mesh Recovery [article]

Zhenzhen Weng, Kuan-Chieh Wang, Angjoo Kanazawa, Serena Yeung
2022 arXiv   pre-print
We propose Domain Adaptive 3D Pose Augmentation (DAPA), a data augmentation method that enhances the model's generalization ability in in-the-wild scenarios.  ...  A fundamental challenge in human mesh recovery is in collecting the ground truth 3D mesh targets required for training, which requires burdensome motion capturing systems and is often limited to indoor  ...  Given in-the-wild target images, poses estimated using the pretrained body regression network often suffer from being biased towards poses in the source datasets.  ... 
arXiv:2206.10457v1 fatcat:ffped37sufbs5l2uzdzamvfyhq

Online Adaptation for Consistent Mesh Reconstruction in the Wild [article]

Xueting Li, Sifei Liu, Shalini De Mello, Kihwan Kim, Xiaolong Wang, Ming-Hsuan Yang, Jan Kautz
2020 arXiv   pre-print
This paper presents an algorithm to reconstruct temporally consistent 3D meshes of deformable object instances from videos in the wild.  ...  Without requiring annotations of 3D mesh, 2D keypoints, or camera pose for each video frame, we pose video-based reconstruction as a self-supervised online adaptation problem applied to any incoming test  ...  For human video-based 3D pose estimation, [33] introduces semi-supervised learning to leverage unlabeled videos with a self-supervised component.  ... 
arXiv:2012.03196v1 fatcat:vpvkmtu3qvav7gpgs4ty5ryugq

SPEC: Seeing People in the Wild with an Estimated Camera [article]

Muhammed Kocabas, Chun-Hao P. Huang, Joachim Tesch, Lea Müller, Otmar Hilliges, Michael J. Black
2021 arXiv   pre-print
To address this, we introduce SPEC, the first in-the-wild 3D HPS method that estimates the perspective camera from a single image and employs this to reconstruct 3D human bodies more accurately.  ...  Due to the lack of camera parameter information for in-the-wild images, existing 3D human pose and shape (HPS) estimation methods make several simplifying assumptions: weak-perspective projection, large  ...  Weakly- supervised 3d human pose learning via multi-view images in the wild.  ... 
arXiv:2110.00620v1 fatcat:b2yinrp22jeohgthzvdicnm6g4

Deep NRSfM++: Towards Unsupervised 2D-3D Lifting in the Wild [article]

Chaoyang Wang and Chen-Hsuan Lin and Simon Lucey
2021 arXiv   pre-print
The recovery of 3D shape and pose from 2D landmarks stemming from a large ensemble of images can be viewed as a non-rigid structure from motion (NRSfM) problem.  ...  Hitherto, these learning approaches have not been able to effectively model perspective cameras or handle missing/occluded points -- limiting their applicability to in-the-wild datasets.  ...  The first row section lists two state-of-the-art supervised meth- ods as reference. The 2nd section lists weakly supervised methods that use external 3D data.  ... 
arXiv:2001.10090v2 fatcat:aetr4hyigfcczckonuzworfe7m

Towards High-Fidelity 3D Face Reconstruction from In-the-Wild Images Using Graph Convolutional Networks [article]

Jiangke Lin, Yi Yuan, Tianjia Shao, Kun Zhou
2020 arXiv   pre-print
In this paper, we introduce a method to reconstruct 3D facial shapes with high-fidelity textures from single-view images in-the-wild, without the need to capture a large-scale face texture database.  ...  3D Morphable Model (3DMM) based methods have achieved great success in recovering 3D face shapes from single-view images.  ...  Approach We propose a coarse-to-fine approach for 3D face reconstruction. As shown in Fig. 2 , our framework is composed of three modules.  ... 
arXiv:2003.05653v3 fatcat:pc6fsavlmzamvmaxt2ocw4tgra

Unsupervised Learning of Probably Symmetric Deformable 3D Objects from Images in the Wild

Shangzhe Wu, Christian Rupprecht, Andrea Vedaldi
2021 IEEE Transactions on Pattern Analysis and Machine Intelligence  
Our experiments show that this method can recover very accurately the 3D shape of human faces, cat faces and cars from single-view images, without any supervision or a prior shape model.  ...  In order to disentangle these components without supervision, we use the fact that many object categories have, at least approximately, a symmetric structure.  ...  ACKNOWLEDGMENTS We would like to thank Soumyadip Sengupta for sharing with us the code to generate synthetic face datasets, and Mihir Sahasrabudhe for sending us the reconstruction results of Lifting AutoEncoders  ... 
doi:10.1109/tpami.2021.3076536 pmid:33914682 fatcat:iggro2bzcjfxra5zovuzjui2ce

A Comprehensive Performance Evaluation of Deformable Face Tracking "In-the-Wild"

Grigorios G. Chrysos, Epameinondas Antonakos, Patrick Snape, Akshay Asthana, Stefanos Zafeiriou
2017 International Journal of Computer Vision  
A very important technology that has not been thoroughly evaluated yet is deformable face tracking "in-the-wild".  ...  This is partially attributed to the fact that comprehensive "in-the-wild" benchmarks have been developed for face detection, landmark localisation and recognition/verification.  ...  , and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.  ... 
doi:10.1007/s11263-017-0999-5 pmid:31983805 pmcid:PMC6953975 fatcat:edknqfajlfdvvmceaupqqosp6y

A survey on face detection in the wild: Past, present and future

Stefanos Zafeiriou, Cha Zhang, Zhengyou Zhang
2015 Computer Vision and Image Understanding  
the application of face detection as a first step.  ...  Representative methods will be described in detail, along with a few additional successful methods that we briefly go through at the end.  ...  Weakly Supervised Setting The two main annotation settings for estimating the parameters of DPMs are weakly and strongly supervised.  ... 
doi:10.1016/j.cviu.2015.03.015 fatcat:d7ehtad5dnf5td5kvlbunp5swe

Driver Glance Classification In-the-wild: Towards Generalization Across Domains and Subjects [article]

Sandipan Banerjee, Ajjen Joshi, Jay Turcot, Bryan Reimer, Taniya Mishra
2021 arXiv   pre-print
Finally, we present a weakly supervised multi-domain training regimen that enables the hourglass to jointly learn representations from different domains (varying in camera type, angle), utilizing unlabeled  ...  We propose a model that takes as input a patch of the driver's face along with a crop of the eye-region and classifies their glance into 6 coarse regions-of-interest (ROIs) in the vehicle.  ...  Leveraging the hourglass' auxiliary reconstruction objective, this approach can learn domain invariant representations from very little labeled data in a weakly supervised manner, and consequently reduce  ... 
arXiv:2012.02906v3 fatcat:rjam3ixz6rbdrfibnj2ih75o2a

One-Shot Object Affordance Detection in the Wild [article]

Wei Zhai, Hongchen Luo, Jing Zhang, Yang Cao, Dacheng Tao
2021 arXiv   pre-print
To this end, we devise a One-Shot Affordance Detection Network (OSAD-Net) that firstly estimates the human action purpose and then transfers it to help detect the common affordance from all candidate images  ...  all objects in a scene with the common affordance should be detected.  ...  Subsequently, Li et al. (2019a) extend this work to 3D indoor scenes and construct a 3D pose synthesizer that fuses semantic knowledge from 2D poses extracted from TV shows as well as 3D geometric knowledge  ... 
arXiv:2108.03658v1 fatcat:kinywe25xzh7vc7g3ajakvst6m

An Image is Worth More Than a Thousand Words: Towards Disentanglement in the Wild [article]

Aviv Gabbay, Niv Cohen, Yedid Hoshen
2021 arXiv   pre-print
As an alternative approach, recent methods rely on limited supervision to disentangle the factors of variation and allow their identifiability.  ...  Our success in this challenging setting, demonstrated on synthetic benchmarks, gives rise to leveraging off-the-shelf image descriptors to partially annotate a subset of attributes in real image domains  ...  In Proceedings of the 2004 IEEE Computer Society Appendix -An Image is Worth More Than a Thousand Words: Towards Disentanglement in The Wild Implementation Details . . . . . . . . . . . . . . . . . . .  ... 
arXiv:2106.15610v2 fatcat:242rvtvzjrh6zf7tmvvbmaglxm
« Previous Showing results 1 — 15 out of 1,158 results