Filters








7,957 Hits in 6.0 sec

Learning 6-DOF Grasping Interaction via Deep Geometry-aware 3D Representations [article]

Xinchen Yan, Jasmine Hsu, Mohi Khansari, Yunfei Bai, Arkanath Pathak, Abhinav Gupta, James Davidson, Honglak Lee
2018 arXiv   pre-print
grid) from RGBD input via generative 3D shape modeling.  ...  Specifically, we formulate the learning of deep geometry-aware grasping model in two steps: First, we learn to build mental geometry-aware representation by reconstructing the scene (i.e., 3D occupancy  ...  In [41] , an in-network projection layer is introduced for 3D shape learning from 2D masks (e.g. 2D silhouette of object).  ... 
arXiv:1708.07303v4 fatcat:73paaikohjg3jivjlt6l3mz76u

Graph-based Hand-Object Meshes and Poses Reconstruction with Multi-Modal Input

Murad Almadani, Ahmed Elhayek, Jameel Malik, Didier Stricker
2021 IEEE Access  
For the best accuracy, we adopted a multi-modal representation of the input by combining the V RGBD 3D representation (i.e. voxelized RGBD) and the color 2D representation (i.e. RGB image).  ...  We show the 2D pose projection on its corresponding RGB image along with two different viewpoints of both 3D pose and shape for better visualization comparison.  ... 
doi:10.1109/access.2021.3117473 fatcat:vqci752wczf2jc5tfxahp3nrbi

GSNet: Joint Vehicle Pose and Shape Reconstruction with Geometrical and Scene-aware Supervision [article]

Lei Ke, Shichao Li, Yanan Sun, Yu-Wing Tai, Chi-Keung Tang
2020 arXiv   pre-print
Based on a divide-and-conquer 3D shape representation strategy, GSNet reconstructs 3D vehicle shape with great detail (1352 vertices and 2700 faces).  ...  We present a novel end-to-end framework named as GSNet (Geometric and Scene-aware Network), which jointly estimates 6DoF poses and reconstructs detailed 3D car shapes from single urban street view.  ...  Finally, multi-task prediction is done in parallel to estimate 3D translation, rotation and shape via the intermediate fused representations. Diverse Feature Extraction and Representation.  ... 
arXiv:2007.13124v1 fatcat:nc23kh23dngz7ox5jp25sojd2i

Inferring the 3D Standing Spine Posture from 2D Radiographs [article]

Amirhossein Bayat, Anjany Sekuboyina, Johannes C. Paetzold, Christian Payer, Darko Stern, Martin Urschler, Jan S. Kirschke, Bjoern H. Menze
2021 arXiv   pre-print
This work aims to integrate the two realms, i.e. it combines the upright spinal curvature from radiographs with the 3D vertebral shape from CT imaging for synthesizing an upright 3D model of spine, loaded  ...  We validate our architecture on digitally reconstructed radiographs, achieving a 3D reconstruction Dice of 95.52%, indicating an almost perfect 2D-to-3D domain translation.  ...  The map&fuse block is responsible for mapping 2D representations of each the sagittal and coronal views into intermediate 3D latent representations followed by fusing them into a single 3D representation  ... 
arXiv:2007.06612v2 fatcat:grpvoikpfzdd7fp32bx7lwtlpa

Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing [article]

Chi Li, M. Zeeshan Zia, Quoc-Huy Tran, Xiang Yu, Gregory D. Hager and Manmohan Chandraker
2017 arXiv   pre-print
We train the network only on synthetic data and demonstrate state-of-the-art performances on real image benchmarks including an extended version of KITTI, PASCAL VOC, PASCAL3D+ and IKEA for 2D and 3D keypoint  ...  We present a deep convolutional neural network (CNN) architecture to localize semantic parts in 2D image and 3D space while inferring their visibility states, given a single RGB image.  ...  Recently, given a single image, autoencoders have been exploited for 2D image rendering [5] , multi-view mesh reconstruction [34] and 3D shape regression under occlusion [25] .  ... 
arXiv:1612.02699v3 fatcat:pcnymzswyjhxblrdvsm5gnex3y

Image-based 3D Object Reconstruction: State-of-the-Art and Trends in the Deep Learning Era

Xianfeng Han, Hamid Laga, Mohammed Bennamoun
2019 IEEE Transactions on Pattern Analysis and Machine Intelligence  
We focus on the works which use deep learning techniques to estimate the 3D shape of generic objects either from a single or multiple RGB images.  ...  3D reconstruction is a longstanding ill-posed problem, which has been explored for decades by the computer vision, computer graphics, and machine learning communities.  ...  INTRODUCTION The goal of image-based 3D reconstruction is to infer the 3D geometry and structure of objects and scenes from one or multiple 2D images.  ... 
doi:10.1109/tpami.2019.2954885 pmid:31751229 fatcat:hc76yes6avdy5byyy7flovj5wa

Multi-view Human Pose and Shape Estimation Using Learnable Volumetric Aggregation [article]

Soyong Shin, Eni Halilaj
2020 arXiv   pre-print
In this paper, we propose a learnable volumetric aggregation approach to reconstruct 3D human body pose and shape from calibrated multi-view images.  ...  Human pose and shape estimation from RGB images is a highly sought after alternative to marker-based motion capture, which is laborious, requires expensive equipment, and constrains capture to laboratory  ...  Pose estimation with parametric body models Parametric 3D human body models [24, 13, 30] enable reconstruction of 3D body shape and pose from 2D images.  ... 
arXiv:2011.13427v1 fatcat:3wxmrmul6nfnxghoystf2s2che

PerspectiveNet: 3D Object Detection from a Single RGB Image via Perspective Points [article]

Siyuan Huang, Yixin Chen, Tao Yuan, Siyuan Qi, Yixin Zhu, Song-Chun Zhu
2019 arXiv   pre-print
Detecting 3D objects from a single RGB image is intrinsically ambiguous, thus requiring appropriate prior knowledge and intermediate representations as constraints to reduce the uncertainties and improve  ...  To address this challenge, we propose to adopt perspective points as a new intermediate representation for 3D object detection, defined as the 2D projections of local Manhattan 3D keypoints to locate an  ...  Factoring shape, pose, and layout from the 2d image of a 3d scene.  ... 
arXiv:1912.07744v1 fatcat:6v6rh2uiunfwhbsokfnbcws43m

BodyNet: Volumetric Inference of 3D Human Body Shapes [chapter]

Gül Varol, Duygu Ceylan, Bryan Russell, Jimei Yang, Ersin Yumer, Ivan Laptev, Cordelia Schmid
2018 Lecture Notes in Computer Science  
In this work we argue for an alternative representation and propose BodyNet, a neural network for direct inference of volumetric body shape from a single image.  ...  BodyNet is an end-to-end trainable network that benefits from (i) a volumetric 3D loss, (ii) a multi-view re-projection loss, and (iii) intermediate supervision of 2D pose, 2D body part segmentation, and  ...  3D human body shape and 3D body parts from a single image.  ... 
doi:10.1007/978-3-030-01234-2_2 fatcat:phmavgdyl5fcbak3u25thqa5su

BodyNet: Volumetric Inference of 3D Human Body Shapes [article]

Gül Varol, Duygu Ceylan, Bryan Russell, Jimei Yang, Ersin Yumer, Ivan Laptev, Cordelia Schmid
2018 arXiv   pre-print
In this work we argue for an alternative representation and propose BodyNet, a neural network for direct inference of volumetric body shape from a single image.  ...  BodyNet is an end-to-end trainable network that benefits from (i) a volumetric 3D loss, (ii) a multi-view re-projection loss, and (iii) intermediate supervision of 2D pose, 2D body part segmentation, and  ...  3D human body shape and 3D body parts from a single image.  ... 
arXiv:1804.04875v3 fatcat:r77wnwutnvao3gd6p3jpujs3ua

Geo-PIFu: Geometry and Pixel Aligned Implicit Functions for Single-view Human Reconstruction [article]

Tong He, John Collomosse, Hailin Jin, Stefano Soatto
2020 arXiv   pre-print
We propose Geo-PIFu, a method to recover a 3D mesh from a monocular color image of a clothed person.  ...  We show that, by both encoding query points and constraining global shape using latent voxel features, the reconstruction we obtain for clothed human meshes exhibits less shape distortion and improved  ...  Feature Fusion Architectures Knowing that the integrated representations lead to the best mesh reconstruction performance, we now further explore different 3D / 2D feature fusion architectures.  ... 
arXiv:2006.08072v2 fatcat:7qbnswptdfapzh2e5bxygvrsgy

Recovering 3D Human Mesh from Monocular Images: A Survey [article]

Yating Tian, Hongwen Zhang, Yebin Liu, Limin Wang
2022 arXiv   pre-print
Estimating human pose and shape from monocular images is a long-standing problem in computer vision.  ...  Meanwhile, continuous efforts are devoted to improving the quality of 3D mesh labels for a wide range of datasets.  ...  [143] explore a R-CNN-based architecture for detection and estimation for all people in the image.  ... 
arXiv:2203.01923v2 fatcat:vb6xa5wdsrhdxd2ebvg54qq2m4

Learning 3D Face Reconstruction with a Pose Guidance Network [article]

Pengpeng Liu, Xintong Han, Michael Lyu, Irwin King, Jia Xu
2020 arXiv   pre-print
face geometry from a single image.  ...  With our specially designed PGN, our model can learn from both faces with fully labeled 3D landmarks and unlimited unlabeled in-the-wild face images.  ...  We also thank Yao Feng, Feng Liu and Ayush Tewari for kind help.  ... 
arXiv:2010.04384v1 fatcat:j7xy3eks7rfsrbr3ildiqpjwea

Pushing the Envelope for RGB-Based Dense 3D Hand Pose Estimation via Neural Rendering

Seungryul Baek, Kwang In Kim, Tae-Kyun Kim
2019 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)  
Estimating 3D hand meshes from single RGB images is challenging, due to intrinsic 2D-3D mapping ambiguities and limited training data.  ...  network and a differentiable renderer, supervised by 2D segmentation masks and 3D skeletons.  ...  Recovering 2D skeletal representations from RGB images has been greatly improved [43] .  ... 
doi:10.1109/cvpr.2019.00116 dblp:conf/cvpr/BaekKK19 fatcat:z5rxp5pwnbaz5jxxrw6en6bbku

Pushing the Envelope for RGB-based Dense 3D Hand Pose Estimation via Neural Rendering [article]

Seungryul Baek, Kwang In Kim, Tae-Kyun Kim
2019 arXiv   pre-print
Estimating 3D hand meshes from single RGB images is challenging, due to intrinsic 2D-3D mapping ambiguities and limited training data.  ...  network and a differentiable renderer, supervised by 2D segmentation masks and 3D skeletons.  ...  Recovering 2D skeletal representations from RGB images has been greatly improved [43] .  ... 
arXiv:1904.04196v2 fatcat:eqvmlyoynves7al2rtduhvimyi
« Previous Showing results 1 — 15 out of 7,957 results