Filters








4,057 Hits in 7.1 sec

Factoring Shape, Pose, and Layout from the 2D Image of a 3D Scene [article]

Shubham Tulsiani, Saurabh Gupta, David Fouhey, Alexei A. Efros, Jitendra Malik
2018 arXiv   pre-print
The goal of this paper is to take a single 2D image of a scene and recover the 3D structure in terms of a small set of factors: a layout representing the enclosing surfaces as well as a set of objects  ...  represented in terms of shape and pose.  ...  We gratefully acknowledge NVIDIA corporation for the donation of Tesla GPUs used for this research.  ... 
arXiv:1712.01812v2 fatcat:cwpfek42s5bvzdvkh5iciy6gk4

Factoring Shape, Pose, and Layout from the 2D Image of a 3D Scene

Shubham Tulsiani, Saurabh Gupta, David Fouhey, Alexei A. Efrosefros, Jitendra Malik
2018 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition  
The goal of this paper is to take a single 2D image of a scene and recover the 3D structure in terms of a small set of factors: a layout representing the enclosing surfaces as well as a set of objects  ...  represented in terms of shape and pose.  ...  We gratefully acknowledge NVIDIA corporation for the donation of Tesla GPUs used for this research.  ... 
doi:10.1109/cvpr.2018.00039 dblp:conf/cvpr/TulsianiGFEM18 fatcat:roavoix2mfej5gnfku6ettj35q

Holistic 3D Scene Understanding from a Single Image with Implicit Representation [article]

Cheng Zhang, Zhaopeng Cui, Yinda Zhang, Bing Zeng, Marc Pollefeys, Shuaicheng Liu
2021 arXiv   pre-print
We present a new pipeline for holistic 3D scene understanding from a single image, which could predict object shapes, object poses, and scene layout.  ...  We not only propose an image-based local structured implicit network to improve the object shape estimation, but also refine the 3D object pose and scene layout via a novel implicit scene graph neural  ...  The input image is also fed into a Layout Estimation Network (LEN) to produce a 3D layout bounding box and relative camera pose.  ... 
arXiv:2103.06422v3 fatcat:i5mmoev4gzfo5h7jepbti35oq4

DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization [article]

Cheng Zhang, Zhaopeng Cui, Cai Chen, Shuaicheng Liu, Bing Zeng, Hujun Bao, Yinda Zhang
2021 arXiv   pre-print
In this paper, we propose a novel method for panoramic 3D scene understanding which recovers the 3D room layout and the shape, pose, position, and semantic category for each object from a single full-view  ...  Panorama images have a much larger field-of-view thus naturally encode enriched scene context information compared to standard perspective images, which however is not well exploited in the previous scene  ...  understanding from a single full-view panorama image, which recovers the 3D room layout and the shape, pose, position, and semantic category of each object in the scene.  ... 
arXiv:2108.10743v1 fatcat:knc6p65etvfihcb6rey3oihrfu

Pano2CAD: Room Layout From A Single Panorama Image [article]

Jiu Xu, Bjorn Stenger, Tommi Kerola, Tony Tung
2016 arXiv   pre-print
This paper presents a method of estimating the geometry of a room and the 3D pose of objects from a single 360-degree panorama image.  ...  The method combines surface normal estimation, 2D object detection and 3D object pose estimation.  ...  From these we obtain a first scene layout up to an unknown scale. Next, objects are detected using a trained detector and initial 3D poses are estimated using a libary of 3D models.  ... 
arXiv:1609.09270v2 fatcat:sm42hneuurgzzaccvzau5mk2bu

Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image [article]

Siyuan Huang, Siyuan Qi, Yixin Zhu, Yinxue Xiao, Yuanlu Xu, Song-Chun Zhu
2018 arXiv   pre-print
Specifically, we introduce a Holistic Scene Grammar (HSG) to represent the 3D scene structure, which characterizes a joint distribution over the functional and geometric space of indoor scenes.  ...  We propose a computational framework to jointly parse a single RGB image and reconstruct a holistic 3D configuration composed by a set of CAD models using a stochastic grammar model.  ...  and the 3D geometric structure of an indoor scene from a single RGB image.  ... 
arXiv:1808.02201v1 fatcat:ulqbp66cnfbnxmqnu4np6enxji

360-DFPE: Leveraging Monocular 360-Layouts for Direct Floor Plan Estimation [article]

Bolivar Solarte, Yueh-Cheng Liu, Chin-Hsuan Wu, Yi-Hsuan Tsai, Min Sun
2022 arXiv   pre-print
Since our task is to sequentially capture the floor plan using monocular images, the entire scene structure, room instances, and room shapes are unknown.  ...  Our approach leverages a loosely coupled integration between a monocular visual SLAM solution and a monocular 360-room layout approach, which estimate camera poses and layout geometries, respectively.  ...  ACKNOWLEDGMENTS This work is supported by the MOST Joint Research Center for AI Technology and All Vista Healthcare, Taiwan Computing Cloud, and MOST 110-2634-F-007-016.  ... 
arXiv:2112.06180v3 fatcat:xt3t7w2xoretnmfsfbuxqhx7vy

3D Visual Proxemics: Recognizing Human Interactions in 3D from a Single Image

Ishani Chakraborty, Hui Cheng, Omar Javed
2013 2013 IEEE Conference on Computer Vision and Pattern Recognition  
The proposed method uses 2D face locations from a single image to estimate the camera pose and the spatial arrangement of people in 3D. Figure 3 : 3 Taxonomy of attributes for Visual Proxemics.  ...  Our 3D shape descriptors are invariant to camera pose variations often seen in web images and videos. The proposed approach also estimates camera pose and uses it to capture the intent of the photo.  ... 
doi:10.1109/cvpr.2013.437 dblp:conf/cvpr/ChakrabortyCJ13 fatcat:j535v2bljjazdgccjobmhi4nea

Perceiving 3D Human-Object Spatial Arrangements from a Single Image in the Wild [article]

Jason Y. Zhang and Sam Pepose and Hanbyul Joo and Deva Ramanan and Jitendra Malik and Angjoo Kanazawa
2020 arXiv   pre-print
We present a method that infers spatial arrangements and shapes of humans and objects in a globally consistent 3D scene, all from a single image in-the-wild captured in an uncontrolled environment.  ...  In particular, we introduce a scale loss that learns the distribution of object size from data; an occlusion-aware silhouette re-projection loss to optimize object pose; and a human-object interaction  ...  This work was funded in part by the CMU Argo AI Center for Autonomous Vehicle Research.  ... 
arXiv:2007.15649v2 fatcat:yccc5eaccncava3knmyw5dyzbe

IM2CAD [article]

Hamid Izadinia, Qi Shan, Steven M. Seitz
2017 arXiv   pre-print
Given a single photo of a room and a large database of furniture CAD models, our goal is to reconstruct a scene that is as similar as possible to the scene depicted in the photograph, and composed of objects  ...  Our approach iteratively optimizes the placement and scale of objects in the room to best match scene renderings to the input photo, using image comparison metrics trained via deep convolutional neural  ...  Acknowledgements This work was supported by funding from National Science Foundation grant IIS-1250793, Google, and the UW Animation Research Labs.  ... 
arXiv:1608.05137v2 fatcat:p2mvia5vlrdr5a2j4akm5ntece

Towards High-Fidelity Single-view Holistic Reconstruction of Indoor Scenes [article]

Haolin Liu, Yujian Zheng, Guanying Chen, Shuguang Cui, Xiaoguang Han
2022 arXiv   pre-print
We present a new framework to reconstruct holistic 3D indoor scenes including both room background and indoor objects from single-view images.  ...  Existing methods can only produce 3D shapes of indoor objects with limited geometry quality because of the heavy occlusion of indoor scenes.  ...  From left to right: input image, the scene reconstructed by our method, results of Total3D [34] , Im3D [57] and our method in a different camera pose.  ... 
arXiv:2207.08656v2 fatcat:5yrv6mxk6fgjbc3d3b4ukm4n7i

Joint 3D Object and Layout Inference from a Single RGB-D Image [chapter]

Andreas Geiger, Chaohui Wang
2015 Lecture Notes in Computer Science  
Inferring 3D objects and the layout of indoor scenes from a single RGB-D image captured with a Kinect camera is a challenging task.  ...  Towards this goal, we propose a high-order graphical model and jointly reason about the layout, objects and superpixels in the image.  ...  In particular, we reason about the type, semantic class, 3D pose and 3D shape of each object and layout element.  ... 
doi:10.1007/978-3-319-24947-6_15 fatcat:yg4mq7f2vnajnoc2r6gf3bbqcu

Scene shape from texture of objects

Nadia Payet, Sinisa Todorovic
2011 CVPR 2011  
Tests against ground truth obtained from stereo images demonstrate that we can coarsely reconstruct a 3D model of the scene from a single image, without learning the layout of common scene surfaces, as  ...  We present an approach to: (1) detecting distinct textures of objects in a scene, (2) reconstructing the 3D shape of detected texture surfaces, and (3) combining object detections and shape-from-texture  ...  At t, descriptors X Reconstructing 3D Scene Layout Deformations of texture elements from the known canonical pose can be used to estimate the underlying 3D shape of the texture surface.  ... 
doi:10.1109/cvpr.2011.5995326 dblp:conf/cvpr/PayetT11 fatcat:tje3cu6bg5fypchszwsoan3jde

Complete 3D Scene Parsing from an RGBD Image [article]

Chuhang Zou, Ruiqi Guo, Zhizhong Li, Derek Hoiem
2018 arXiv   pre-print
One major goal of vision is to infer physical models of objects, surfaces, and their layout from sensors. In this paper, we aim to interpret indoor scenes from one RGBD image.  ...  Our representation encodes the layout of orthogonal walls and the extent of objects, modeled with CAD-like 3D shapes.  ...  We thank David Forsyth for insightful comments and discussion and Saurabh Singh, Kevin Shih and Tanmay Gupta for their comments on an earlier version of the manuscript.  ... 
arXiv:1710.09490v2 fatcat:mctuctwslvgdnkt2l5shoruolq

Marr Revisited: 2D-3D Alignment via Surface Normal Prediction

Aayush Bansal, Bryan Russell, Abhinav Gupta
2016 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)  
We use the image and predicted surface normals to retrieve a 3D model from a large library of object CAD models.  ...  When using the predicted surface normals, our two-stream network matches prior work using surface normals computed from RGB-D images on the task of pose prediction, and achieves state of the art when using  ...  We thank Saining Xie for discussion on skip-network architectures, David Fouhey for providing code to compute normals from Kinect data, and Saurabh Gupta for help with the pose estimation evaluation setup  ... 
doi:10.1109/cvpr.2016.642 dblp:conf/cvpr/BansalRG16 fatcat:oz54dqxt4feh5njx7xiaru3iki
« Previous Showing results 1 — 15 out of 4,057 results