A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
3D-to-2D Distillation for Indoor Scene Parsing
[article]
2021
arXiv
pre-print
Indoor scene semantic parsing from RGB images is very challenging due to occlusions, object distortion, and viewpoint variations. ...
First, we distill 3D knowledge from a pretrained 3D network to supervise a 2D network to learn simulated 3D features from 2D features during the training, so the 2D network can infer without requiring ...
The code can be found in https://github.com/liuzhengzhe/ 3D-to-2D-Distillation-for-Indoor-Scene-Parsing.
A. ...
arXiv:2104.02243v2
fatcat:gxxz3xjtqvee3ophixmm4dfi5a
Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image
[chapter]
2018
Lecture Notes in Computer Science
Specifically, we introduce a Holistic Scene Grammar (HSG) to represent the 3D scene structure, which characterizes a joint distribution over the functional and geometric space of indoor scenes. ...
We propose a computational framework to jointly parse a single RGB image and reconstruct a holistic 3D configuration composed by a set of CAD models using a stochastic grammar model. ...
Fig. 1 : Illustration of the proposed holistic 3D indoor scene parsing and reconstruction in an analysis-by synthesis fashion. ...
doi:10.1007/978-3-030-01234-2_12
fatcat:n2lyd2g6sve6lartrjexgdal3q
Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image
[article]
2018
arXiv
pre-print
Specifically, we introduce a Holistic Scene Grammar (HSG) to represent the 3D scene structure, which characterizes a joint distribution over the functional and geometric space of indoor scenes. ...
We propose a computational framework to jointly parse a single RGB image and reconstruct a holistic 3D configuration composed by a set of CAD models using a stochastic grammar model. ...
Fig. 1 : Illustration of the proposed holistic 3D indoor scene parsing and reconstruction in an analysis-by synthesis fashion. ...
arXiv:1808.02201v1
fatcat:ulqbp66cnfbnxmqnu4np6enxji
Data Efficient 3D Learner via Knowledge Transferred from 2D Model
[article]
2022
arXiv
pre-print
Collecting and labeling the registered 3D point cloud is costly. As a result, 3D resources for training are typically limited in quantity compared to the 2D images counterpart. ...
Specifically, we utilize a strong and well-trained semantic segmentation model for 2D images to augment RGB-D images with pseudo-label. The augmented dataset can then be used to pre-train 3D models. ...
We use RGB-D images as the bridge to transfer the knowledge from a strong and well-trained 2D scene parsing network to 3D models. ...
arXiv:2203.08479v2
fatcat:4xhrrwld7ngs3kz4b6ry6ba364
Recent Advances in Monocular 2D and 3D Human Pose Estimation: A Deep Learning Perspective
[article]
2021
arXiv
pre-print
2D and 3D, and the complex multi-person scenarios. ...
In this paper, we provide a comprehensive and holistic 2D-to-3D perspective to tackle this problem. We categorize the mainstream and milestone approaches since the year 2014 under unified frameworks. ...
For multi-person scenes, to estimate the 2D or 3D pose of each person, existing works exploit the top-down paradigm or bottom-up paradigm. ...
arXiv:2104.11536v1
fatcat:tdag2jq2vjdrjekwukm5nu7l6a
Predictive and Semantic Layout Estimation for Robotic Applications in Manhattan Worlds
[article]
2018
arXiv
pre-print
The scheme can be run in an online manner to build water tight representations of the environment. ...
The system effectively speculates about room boundaries and free space regions which provides useful guidance to subsequent motion planning systems. ...
Note however that the system still analyzes a complete 3D point cloud and 3D trajectory to produce the distilled floor plan. ...
arXiv:1811.07442v1
fatcat:si5s2e3plnavrcrgt7mmp6ppma
The Fusion Strategy of 2D and 3D Information Based on Deep Learning: A Review
2021
Remote Sensing
Using 2D and 3D information fusion for the advantages of compensation and accuracy improvement has become a hot research topic. ...
Moreover, according to the methods included in this paper, the 2D information and 3D information of different methods come from various kinds of data. ...
Data sharing is not applicable to this article.
Conflicts of Interest: The authors declare no conflict of interest. ...
doi:10.3390/rs13204029
fatcat:onnjeqvwb5gsjcrhdaq6hiekru
Variational Context-Deformable ConvNets for Indoor Scene Parsing
2020
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Especially in indoor scenes, the large variation of object scales makes spatial-context an important factor for improving the segmentation performance. ...
Bayesian probabilistic modeling is introduced for the training of VCD module, which can make it continuous and more stable; 3) a perspective-aware guidance module is designed to take advantage of multi-modal ...
Related Work
RGB-D Image Semantic Segmentation RGB-D indoor scene parsing has been studied for years, and numerous methods have been proposed [7, 13, 45, 16, 31] . ...
doi:10.1109/cvpr42600.2020.00405
dblp:conf/cvpr/Xiong0G020
fatcat:ijodulywevgandvwemvkznyt6a
Modern Augmented Reality: Applications, Trends, and Future Directions
[article]
2022
arXiv
pre-print
This work tries to provide an overview of modern augmented reality, from both application-level and technical perspective. ...
Although it has been around for nearly fifty years, it has seen a lot of interest by the research community in the recent years, mainly because of the huge success of deep learning models for various computer ...
ACKNOWLEDGMENTS We would like to thank Iasonas Kokkinos, Qi Pan, Lyric Kaplan, and Liz Markman for reviewing this work, and providing very helpful comments and suggestions. ...
arXiv:2202.09450v2
fatcat:x436ycnvxnhdpfdvhnxkzgbqce
Recent Advances of Monocular 2D and 3D Human Pose Estimation: A Deep Learning Perspective
2022
ACM Computing Surveys
Especially, we provide insightful analyses for the intrinsic connections and methods evolution from 2D to 3D pose estimation. ...
Although there have been some works to summarize different approaches, it still remains challenging for researchers to have an in-depth view of how these approaches work from 2D to 3D. ...
For multi-person scenes as shown in Fig. 4 , to estimate the 2D or 3D pose of each person, existing works exploit the top-down paradigm or bottom-up paradigm. ...
doi:10.1145/3524497
fatcat:4pbvntngrnfp7lqhcpjmy7p2fq
VCIP 2020 Index
2020
2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)
of Point Clou
and Multiple Views for 3D Shape Recognition
Lei, Xuejing
Noise-Aware Texture-Preserving Low-Light
Enhancement
Lei, Zhengchao
Efficient Light Deep Network for Street Scene
Parsing ...
Hu, Menghan
Special Cane with Visual Odometry for Real-tim
Indoor Navigation of Blind People
Hu, Menghan
Wearable Visually Assistive Device for Blind
People to Appreciate Real-world Scene and ...
doi:10.1109/vcip49819.2020.9301896
fatcat:bdh7cuvstzgrbaztnahjdp5s5y
Disentangling 3D Prototypical Networks For Few-Shot Concept Learning
[article]
2021
arXiv
pre-print
They are trained end-to-end self-supervised by predicting views in static scenes, alongside a small number of 3D object boxes. ...
We present neural architectures that disentangle RGB-D images into objects' shapes and styles and a map of the background scene, and explore their applications for few-shot 3D object detection and few-shot ...
Replica dataset provides high quality reconstructions for 18 indoor scenes. We use AI Habitat simulator (Manolis Savva* et al., 2019) to render multiview RGB-D data for these meshes. ...
arXiv:2011.03367v3
fatcat:bqxpkyamcnf53cvbgrravlgk4m
Unsupervised Cross-Modal Alignment for Multi-Person 3D Pose Estimation
[article]
2020
arXiv
pre-print
We aim to enhance the model's ability to perform beyond the limiting teacher network by enriching the latent-to-3D pose mapping using artificially synthesized multi-person 3D scene samples. ...
We present a deployment friendly, fast bottom-up framework for multi-person 3D human pose estimation. ...
To achieve this, we plan to distill the knowledge from a frozen teacher network which is trained for an auxiliary task of multi-person 2D landmark estimation. ...
arXiv:2008.01388v1
fatcat:rlfpgoy6vjayhmwyhhvihdizlm
Pedestrian Attribute Recognition by Joint Visual-semantic Reasoning and Knowledge Distillation
2019
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence
An additional regularization term is proposed by distilling human parsing knowledge from a pre-trained teacher model to enhance feature representations. ...
To achieve effective recognition, this paper presents a graph-based global reasoning framework to jointly model potential visual-semantic relations of attributes and distill auxiliary human parsing knowledge ...
Experimental Results In contrast, our method has significantly improved the results by all metrics due to its effectiveness of distilling human parsing knowledge as the guidance for reasoning. ...
doi:10.24963/ijcai.2019/117
dblp:conf/ijcai/LiZHH19
fatcat:xsoktk6y5vh4xlb7deqe6zfdd4
RnR: Retrieval and Reprojection Learning Model for Camera Localization
2021
IEEE Access
out camera calibration between the 2D image plane and the 3D scene. ...
More precise localization is achieved by camera calibration between the 2D image and the 3D scene using a fully convolutional network. ...
ACKNOWLEDGMENT This research is carried out at the National Engineering Laboratory for Big Data System Computing Technology, China. ...
doi:10.1109/access.2021.3061634
fatcat:ygwdr6jf2fcfxlbplfkify46ve
« Previous
Showing results 1 — 15 out of 124 results