Filters








124 Hits in 5.1 sec

3D-to-2D Distillation for Indoor Scene Parsing [article]

Zhengzhe Liu, Xiaojuan Qi, Chi-Wing Fu
2021 arXiv   pre-print
Indoor scene semantic parsing from RGB images is very challenging due to occlusions, object distortion, and viewpoint variations.  ...  First, we distill 3D knowledge from a pretrained 3D network to supervise a 2D network to learn simulated 3D features from 2D features during the training, so the 2D network can infer without requiring  ...  The code can be found in https://github.com/liuzhengzhe/ 3D-to-2D-Distillation-for-Indoor-Scene-Parsing. A.  ... 
arXiv:2104.02243v2 fatcat:gxxz3xjtqvee3ophixmm4dfi5a

Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image [chapter]

Siyuan Huang, Siyuan Qi, Yixin Zhu, Yinxue Xiao, Yuanlu Xu, Song-Chun Zhu
2018 Lecture Notes in Computer Science  
Specifically, we introduce a Holistic Scene Grammar (HSG) to represent the 3D scene structure, which characterizes a joint distribution over the functional and geometric space of indoor scenes.  ...  We propose a computational framework to jointly parse a single RGB image and reconstruct a holistic 3D configuration composed by a set of CAD models using a stochastic grammar model.  ...  Fig. 1 : Illustration of the proposed holistic 3D indoor scene parsing and reconstruction in an analysis-by synthesis fashion.  ... 
doi:10.1007/978-3-030-01234-2_12 fatcat:n2lyd2g6sve6lartrjexgdal3q

Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image [article]

Siyuan Huang, Siyuan Qi, Yixin Zhu, Yinxue Xiao, Yuanlu Xu, Song-Chun Zhu
2018 arXiv   pre-print
Specifically, we introduce a Holistic Scene Grammar (HSG) to represent the 3D scene structure, which characterizes a joint distribution over the functional and geometric space of indoor scenes.  ...  We propose a computational framework to jointly parse a single RGB image and reconstruct a holistic 3D configuration composed by a set of CAD models using a stochastic grammar model.  ...  Fig. 1 : Illustration of the proposed holistic 3D indoor scene parsing and reconstruction in an analysis-by synthesis fashion.  ... 
arXiv:1808.02201v1 fatcat:ulqbp66cnfbnxmqnu4np6enxji

Data Efficient 3D Learner via Knowledge Transferred from 2D Model [article]

Ping-Chung Yu, Cheng Sun, Min Sun
2022 arXiv   pre-print
Collecting and labeling the registered 3D point cloud is costly. As a result, 3D resources for training are typically limited in quantity compared to the 2D images counterpart.  ...  Specifically, we utilize a strong and well-trained semantic segmentation model for 2D images to augment RGB-D images with pseudo-label. The augmented dataset can then be used to pre-train 3D models.  ...  We use RGB-D images as the bridge to transfer the knowledge from a strong and well-trained 2D scene parsing network to 3D models.  ... 
arXiv:2203.08479v2 fatcat:4xhrrwld7ngs3kz4b6ry6ba364

Recent Advances in Monocular 2D and 3D Human Pose Estimation: A Deep Learning Perspective [article]

Wu Liu, Qian Bao, Yu Sun, Tao Mei
2021 arXiv   pre-print
2D and 3D, and the complex multi-person scenarios.  ...  In this paper, we provide a comprehensive and holistic 2D-to-3D perspective to tackle this problem. We categorize the mainstream and milestone approaches since the year 2014 under unified frameworks.  ...  For multi-person scenes, to estimate the 2D or 3D pose of each person, existing works exploit the top-down paradigm or bottom-up paradigm.  ... 
arXiv:2104.11536v1 fatcat:tdag2jq2vjdrjekwukm5nu7l6a

Predictive and Semantic Layout Estimation for Robotic Applications in Manhattan Worlds [article]

Armon Shariati, Bernd Pfrommer, Camillo J. Taylor
2018 arXiv   pre-print
The scheme can be run in an online manner to build water tight representations of the environment.  ...  The system effectively speculates about room boundaries and free space regions which provides useful guidance to subsequent motion planning systems.  ...  Note however that the system still analyzes a complete 3D point cloud and 3D trajectory to produce the distilled floor plan.  ... 
arXiv:1811.07442v1 fatcat:si5s2e3plnavrcrgt7mmp6ppma

The Fusion Strategy of 2D and 3D Information Based on Deep Learning: A Review

Jianghong Zhao, Yinrui Wang, Yuee Cao, Ming Guo, Xianfeng Huang, Ruiju Zhang, Xintong Dou, Xinyu Niu, Yuanyuan Cui, Jun Wang
2021 Remote Sensing  
Using 2D and 3D information fusion for the advantages of compensation and accuracy improvement has become a hot research topic.  ...  Moreover, according to the methods included in this paper, the 2D information and 3D information of different methods come from various kinds of data.  ...  Data sharing is not applicable to this article. Conflicts of Interest: The authors declare no conflict of interest.  ... 
doi:10.3390/rs13204029 fatcat:onnjeqvwb5gsjcrhdaq6hiekru

Variational Context-Deformable ConvNets for Indoor Scene Parsing

Zhitong Xiong, Yuan Yuan, Nianhui Guo, Qi Wang
2020 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)  
Especially in indoor scenes, the large variation of object scales makes spatial-context an important factor for improving the segmentation performance.  ...  Bayesian probabilistic modeling is introduced for the training of VCD module, which can make it continuous and more stable; 3) a perspective-aware guidance module is designed to take advantage of multi-modal  ...  Related Work RGB-D Image Semantic Segmentation RGB-D indoor scene parsing has been studied for years, and numerous methods have been proposed [7, 13, 45, 16, 31] .  ... 
doi:10.1109/cvpr42600.2020.00405 dblp:conf/cvpr/Xiong0G020 fatcat:ijodulywevgandvwemvkznyt6a

Modern Augmented Reality: Applications, Trends, and Future Directions [article]

Shervin Minaee, Xiaodan Liang, Shuicheng Yan
2022 arXiv   pre-print
This work tries to provide an overview of modern augmented reality, from both application-level and technical perspective.  ...  Although it has been around for nearly fifty years, it has seen a lot of interest by the research community in the recent years, mainly because of the huge success of deep learning models for various computer  ...  ACKNOWLEDGMENTS We would like to thank Iasonas Kokkinos, Qi Pan, Lyric Kaplan, and Liz Markman for reviewing this work, and providing very helpful comments and suggestions.  ... 
arXiv:2202.09450v2 fatcat:x436ycnvxnhdpfdvhnxkzgbqce

Recent Advances of Monocular 2D and 3D Human Pose Estimation: A Deep Learning Perspective

Wu Liu, Tao Mei
2022 ACM Computing Surveys  
Especially, we provide insightful analyses for the intrinsic connections and methods evolution from 2D to 3D pose estimation.  ...  Although there have been some works to summarize different approaches, it still remains challenging for researchers to have an in-depth view of how these approaches work from 2D to 3D.  ...  For multi-person scenes as shown in Fig. 4 , to estimate the 2D or 3D pose of each person, existing works exploit the top-down paradigm or bottom-up paradigm.  ... 
doi:10.1145/3524497 fatcat:4pbvntngrnfp7lqhcpjmy7p2fq

VCIP 2020 Index

2020 2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)  
of Point Clou and Multiple Views for 3D Shape Recognition Lei, Xuejing Noise-Aware Texture-Preserving Low-Light Enhancement Lei, Zhengchao Efficient Light Deep Network for Street Scene Parsing  ...  Hu, Menghan Special Cane with Visual Odometry for Real-tim Indoor Navigation of Blind People Hu, Menghan Wearable Visually Assistive Device for Blind People to Appreciate Real-world Scene and  ... 
doi:10.1109/vcip49819.2020.9301896 fatcat:bdh7cuvstzgrbaztnahjdp5s5y

Disentangling 3D Prototypical Networks For Few-Shot Concept Learning [article]

Mihir Prabhudesai, Shamit Lal, Darshan Patil, Hsiao-Yu Tung, Adam W Harley, Katerina Fragkiadaki
2021 arXiv   pre-print
They are trained end-to-end self-supervised by predicting views in static scenes, alongside a small number of 3D object boxes.  ...  We present neural architectures that disentangle RGB-D images into objects' shapes and styles and a map of the background scene, and explore their applications for few-shot 3D object detection and few-shot  ...  Replica dataset provides high quality reconstructions for 18 indoor scenes. We use AI Habitat simulator (Manolis Savva* et al., 2019) to render multiview RGB-D data for these meshes.  ... 
arXiv:2011.03367v3 fatcat:bqxpkyamcnf53cvbgrravlgk4m

Unsupervised Cross-Modal Alignment for Multi-Person 3D Pose Estimation [article]

Jogendra Nath Kundu, Ambareesh Revanur, Govind Vitthal Waghmare, Rahul Mysore Venkatesh, R. Venkatesh Babu
2020 arXiv   pre-print
We aim to enhance the model's ability to perform beyond the limiting teacher network by enriching the latent-to-3D pose mapping using artificially synthesized multi-person 3D scene samples.  ...  We present a deployment friendly, fast bottom-up framework for multi-person 3D human pose estimation.  ...  To achieve this, we plan to distill the knowledge from a frozen teacher network which is trained for an auxiliary task of multi-person 2D landmark estimation.  ... 
arXiv:2008.01388v1 fatcat:rlfpgoy6vjayhmwyhhvihdizlm

Pedestrian Attribute Recognition by Joint Visual-semantic Reasoning and Knowledge Distillation

Qiaozhe Li, Xin Zhao, Ran He, Kaiqi Huang
2019 Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence  
An additional regularization term is proposed by distilling human parsing knowledge from a pre-trained teacher model to enhance feature representations.  ...  To achieve effective recognition, this paper presents a graph-based global reasoning framework to jointly model potential visual-semantic relations of attributes and distill auxiliary human parsing knowledge  ...  Experimental Results In contrast, our method has significantly improved the results by all metrics due to its effectiveness of distilling human parsing knowledge as the guidance for reasoning.  ... 
doi:10.24963/ijcai.2019/117 dblp:conf/ijcai/LiZHH19 fatcat:xsoktk6y5vh4xlb7deqe6zfdd4

RnR: Retrieval and Reprojection Learning Model for Camera Localization

S Yang, D Shi
2021 IEEE Access  
out camera calibration between the 2D image plane and the 3D scene.  ...  More precise localization is achieved by camera calibration between the 2D image and the 3D scene using a fully convolutional network.  ...  ACKNOWLEDGMENT This research is carried out at the National Engineering Laboratory for Big Data System Computing Technology, China.  ... 
doi:10.1109/access.2021.3061634 fatcat:ygwdr6jf2fcfxlbplfkify46ve
« Previous Showing results 1 — 15 out of 124 results