Filters








11,637 Hits in 6.7 sec

Consistent 3D Hand Reconstruction in Video via self-supervised Learning [article]

Zhigang Tu, Zhisheng Huang, Yujin Chen, Di Kang, Linchao Bao, Bisheng Yang, Junsong Yuan
2022 arXiv   pre-print
Thus we propose S^2HAND, a self-supervised 3D hand reconstruction model, that can jointly estimate pose, shape, texture, and the camera viewpoint from a single RGB input through the supervision of easily  ...  Experiments on benchmark datasets demonstrate that our self-supervised approach produces comparable hand reconstruction performance compared with the recent full-supervised methods in single-frame as input  ...  Self-supervised Hand Reconstruction from image collections The S 2 HAND model learns self-supervised 3D hand reconstruction from image collections via training a 3D hand reconstruction network with the  ... 
arXiv:2201.09548v1 fatcat:sv6463vtzndkdlv2jhsq2itpim

Seeing 3D Objects in a Single Image via Self-Supervised Static-Dynamic Disentanglement [article]

Prafull Sharma, Ayush Tewari, Yilun Du, Sergey Zakharov, Rares Ambrus, Adrien Gaidon, William T. Freeman, Fredo Durand, Joshua B. Tenenbaum, Vincent Sitzmann
2022 arXiv   pre-print
These ground plans are 2D grids of features aligned with the ground plane that can be locally decoded into 3D neural radiance fields. Our model is trained self-supervised via neural rendering.  ...  We learn this skill not via labeled examples, but simply by observing objects move.  ...  Recent work in self-supervised learning has made significant progress towards the goal of object discovery via self-supervised object-centric representation learning for images [1, 2] and videos [3]  ... 
arXiv:2207.11232v1 fatcat:q67bq2mdpvhlngnt3aenu7hvb4

Leveraging Photometric Consistency over Time for Sparsely Supervised Hand-Object Reconstruction [article]

Yana Hasson, Bugra Tekin, Federica Bogo, Ivan Laptev, Marc Pollefeys, Cordelia Schmid
2020 arXiv   pre-print
Our model is trained end-to-end on color images to jointly reconstruct hands and objects in 3D by inferring their poses.  ...  We then apply a self-supervised photometric loss that relies on the visual consistency between nearby images.  ...  Our method models the temporal nature of 3D hand and object interactions and leverages motion as a self-supervisory signal for 3D dense hand-object reconstruction.  ... 
arXiv:2004.13449v1 fatcat:gky7kjfxkfa4plfoh7be6icn6y

Leveraging Photometric Consistency Over Time for Sparsely Supervised Hand-Object Reconstruction

Yana Hasson, Bugra Tekin, Federica Bogo, Ivan Laptev, Marc Pollefeys, Cordelia Schmid
2020 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)  
Our model is trained end-to-end on color images to jointly reconstruct hands and objects in 3D by inferring their poses.  ...  We then apply a self-supervised photometric loss that relies on the visual consistency between nearby images.  ...  Our method models the temporal nature of 3D hand and object interactions and leverages motion as a self-supervisory signal for 3D dense hand-object reconstruction.  ... 
doi:10.1109/cvpr42600.2020.00065 dblp:conf/cvpr/HassonTBLPS20 fatcat:unt2mamyovbibml27ors23p4ru

THUNDR: Transformer-based 3D HUmaN Reconstruction with Markers [article]

Mihai Zanfir, Andrei Zanfir, Eduard Gabriel Bazavan, William T. Freeman, Rahul Sukthankar, Cristian Sminchisescu
2021 arXiv   pre-print
We show state-of-the-art results on Human3.6M and 3DPW, for both the fully-supervised and the self-supervised models, for the task of inferring 3d human shape, joint positions, and global translation.  ...  We present THUNDR, a transformer-based deep neural network methodology to reconstruct the 3d pose and shape of people, given monocular RGB images.  ...  Finally-and especially when learning with small supervised training sets or for exploratory self-supervised learning-, the lack of regularization given by a body model could lead to 3d predictions with  ... 
arXiv:2106.09336v1 fatcat:pcftvtzwlfa3hov6wmnrsb3btq

3D Shape Reconstruction from a Single 2D Image via 2D-3D Self-Consistency [article]

Yi-Lun Liao, Yao-Cheng Yang, Yu-Chiang Frank Wang
2018 arXiv   pre-print
In this paper, we propose a framework for semi-supervised 3D reconstruction.  ...  Aiming at inferring 3D shapes from 2D images, 3D shape reconstruction has drawn huge attention from researchers in computer vision and deep learning communities.  ...  Feature Disentanglement via 2D-3D Self-Consistency Our proposed model is capable of deep feature disentanglement and pose-aware 3D reconstruction.  ... 
arXiv:1811.12016v1 fatcat:u5mq722khbdete5fcyxjqxh4bm

Learning-based Monocular 3D Reconstruction of Birds: A Contemporary Survey [article]

Seyed Mojtaba Marvasti-Zadeh, Mohammad N.S. Jahromi, Javad Khaghani, Devin Goodsman, Nilanjan Ray, Nadir Erbilgin
2022 arXiv   pre-print
To the best of our knowledge, this work is the first attempt to provide an overview of recent advances in 3D bird reconstruction based on monocular vision, give both computer vision and biology researchers  ...  Recent advances in 3D vision have led to a number of impressive works on the 3D shape and pose estimation, each with different pros and cons.  ...  TABLE I COMPARISON I OF LEARNING-BASED MONOCULAR 3D BIRDS RECONSTRUCTION.  ... 
arXiv:2207.04512v2 fatcat:4amonw6u7ree7adqj4iexsvxhm

Self-supervised Single-view 3D Reconstruction via Semantic Consistency [article]

Xueting Li, Sifei Liu, Kihwan Kim, Shalini De Mello, Varun Jampani, Ming-Hsuan Yang, Jan Kautz
2020 arXiv   pre-print
We learn a self-supervised, single-view 3D reconstruction model that predicts the 3D mesh shape, texture and camera pose of a target object with a collection of 2D images and silhouettes.  ...  methods learned with supervision.  ...  Better Part Segmentation via Reconstruction The proposed 3D reconstruction model can, in turn, be used to improve learning of self-supervised part segmentation [17] (see Figure 10 ).  ... 
arXiv:2003.06473v1 fatcat:kf4djdg7d5ddzb2wibdlzxfltu

Semi-supervised 3D Hand-Object Pose Estimation via Pose Dictionary Learning [article]

Zida Cheng, Siheng Chen, Ya Zhang
2021 arXiv   pre-print
The proposed pose dictionary learning module can distinguish infeasible poses by reconstruction error, enabling unlabeled data to provide supervision signals.  ...  To tackle the problem of data collection, we propose a semi-supervised 3D hand-object pose estimation method with two key techniques: pose dictionary learning and an object-oriented coordinate system.  ...  The training process is to learn such a 3D grasping pose dictionary through self-reconstruction.  ... 
arXiv:2107.07676v1 fatcat:lneazvwzw5cxbiz4j2tm26ve5m

Pushing the Envelope for RGB-Based Dense 3D Hand Pose Estimation via Neural Rendering

Seungryul Baek, Kwang In Kim, Tae-Kyun Kim
2019 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)  
We adopt a compact parametric 3D hand model that represents deformable and articulated hand meshes.  ...  Experiments using three RGB-based benchmarks show that our framework offers beyond state-of-the-art accuracy in 3D pose estimation, as well as recovers dense 3D hand shapes.  ...  [26] incorporate a 3D hand mesh model to CNNs and learn a mapping between depth maps and mesh model parameters.  ... 
doi:10.1109/cvpr.2019.00116 dblp:conf/cvpr/BaekKK19 fatcat:z5rxp5pwnbaz5jxxrw6en6bbku

LBS Autoencoder: Self-supervised Fitting of Articulated Meshes to Point Clouds [article]

Chun-Liang Li, Tomas Simon, Jason Saragih, Barnabás Póczos, Yaser Sheikh
2019 arXiv   pre-print
We present LBS-AE; a self-supervised autoencoding algorithm for fitting articulated mesh models to point clouds.  ...  To avoid poor local minima from erroneous point-to-point correspondences, we utilize a structured Chamfer distance based on part-segmentations, which are learned concurrently using self-supervision.  ...  We study how the segmentation learning with self-supervision interacts with the model fitting to data. We train different variants of LBS-AE to fit the captured hands data.  ... 
arXiv:1904.10037v1 fatcat:476yx74juvb5fhirk3p5ixn5qy

Pushing the Envelope for RGB-based Dense 3D Hand Pose Estimation via Neural Rendering [article]

Seungryul Baek, Kwang In Kim, Tae-Kyun Kim
2019 arXiv   pre-print
We adopt a compact parametric 3D hand model that represents deformable and articulated hand meshes.  ...  Experiments using three RGB-based benchmarks show that our framework offers beyond state-of-the-art accuracy in 3D pose estimation, as well as recovers dense 3D hand shapes.  ...  [26] incorporate a 3D hand mesh model to CNNs and learn a mapping between depth maps and mesh model parameters.  ... 
arXiv:1904.04196v2 fatcat:eqvmlyoynves7al2rtduhvimyi

Hand Image Understanding via Deep Multi-Task Learning [article]

Xiong Zhang, Hongsheng Huang, Jianchao Tan, Hongmin Xu, Cheng Yang, Guozhu Peng, Lei Wang, Ji Liu
2021 arXiv   pre-print
by a coarse-to-fine learning paradigm and a self-supervised learning strategy.  ...  There are various works focusing on recovering hand information from single image, however, they usually solve a single task, for example, hand mask segmentation, 2D/3D hand pose estimation, or hand mesh  ...  errors between the rendered mask and the estimated segmentation mask via self-supervised learning.  ... 
arXiv:2107.11646v2 fatcat:il4ifqblqvgxxlc27rf7ykhb4y

Uni4Eye: Unified 2D and 3D Self-supervised Pre-training via Masked Image Modeling Transformer for Ophthalmic Image Classification [article]

Zhiyuan Cai and Li Lin and Huaqing He and Xiaoying Tang
2022 arXiv   pre-print
Self-supervised learning (SSL) methods bring huge opportunities for better utilizing unlabeled data, as they do not need massive annotations.  ...  A large-scale labeled dataset is a key factor for the success of supervised deep learning in computer vision.  ...  Self-supervised Vision Transformer (SiT) [1] conducts image reconstruction, rotation prediction and contrastive learning tasks for pretraining, which outperforms randomly-weighted initialization and  ... 
arXiv:2203.04614v2 fatcat:hdux5mblnrc2vniadl5k7gmvci

Pre-train, Self-train, Distill: A simple recipe for Supersizing 3D Reconstruction [article]

Kalyan Vasudev Alwala, Abhinav Gupta, Shubham Tulsiani
2022 arXiv   pre-print
Our work learns a unified model for single-view 3D reconstruction of objects from hundreds of semantic categories.  ...  As a scalable alternative to direct 3D supervision, our work relies on segmented image collections for learning 3D of generic categories.  ...  On the other hand, obtaining 3D supervision for images of generic objects is extremely hard.  ... 
arXiv:2204.03642v1 fatcat:uffe7ngtaffxza4aypqi622uqa
« Previous Showing results 1 — 15 out of 11,637 results