Filters








1,341 Hits in 5.4 sec

Progressive 3D Scene Understanding with Stacked Neural Networks

Youcheng Song, Zhengxing Sun
2018 Pacific Conference on Computer Graphics and Applications  
The scene understanding task is decomposed into several different but related tasks, and semantic objects are progressively separated from coarse to fine.  ...  The former network segments the 3D scene at a coarser level and passes the result as context to the latter one for a finer-grained segmentation.  ...  Inspired by this coarse-to-fine understanding process, we can use several networks to segment 3D scene progressively with refined semantic classes for 3D scene understanding task.  ... 
doi:10.2312/pg.20181280 dblp:conf/pg/SongS18 fatcat:vknckdkmiber3fmij4wvc7ikcq

Label Refinement Network for Coarse-to-Fine Semantic Segmentation [article]

Md Amirul Islam, Shujon Naha, Mrigank Rochan, Neil Bruce, Yang Wang
2017 arXiv   pre-print
The segmentation labels at a coarse resolution are used together with convolutional features to obtain finer resolution segmentation labels.  ...  We propose a novel network architecture called the label refinement network that predicts segmentation labels in a coarse-to-fine fashion at several resolutions.  ...  There are 37 indoor scene classes with corresponding segmentation labels (background is not considered as a class and is ignored during training and testing).  ... 
arXiv:1703.00551v1 fatcat:4xupd6go7ff6xdoxhk5awbgmjy

Semantic Bottleneck Scene Generation [article]

Samaneh Azadi, Michael Tschannen, Eric Tzeng, Sylvain Gelly, Trevor Darrell, Mario Lucic
2019 arXiv   pre-print
For the former, we use an unconditional progressive segmentation generation network that captures the distribution of realistic semantic scene layouts.  ...  We assume pixel-wise segmentation labels are available during training and use them to learn the scene structure.  ...  We thank Marvin Ritter for help with issues related to the compare gan library [27] . We are grateful to the members of BAIR for fruitful discussions.  ... 
arXiv:1911.11357v1 fatcat:id7o6lwt6bejfcs2tlnxt2rrb4

SG-NN: Sparse Generative Neural Networks for Self-Supervised Scene Completion of RGB-D Scans [article]

Angela Dai, Christian Diller, Matthias Nießner
2020 arXiv   pre-print
Combined with a new 3D sparse generative neural network architecture, our method is able to predict highly-detailed surfaces in a coarse-to-fine hierarchical fashion, generating 3D scenes at 2cm resolution  ...  To achieve self-supervision, we remove frames from a given (incomplete) 3D scan in order to make it even more incomplete; self-supervision is then formulated by correlating the two levels of partialness  ...  Our self-supervision approach using loss masking enables more complete scene prediction than direct supervision using the target RGB-D scan, particularly in regions where occlusions commonly occur.  ... 
arXiv:1912.00036v2 fatcat:icywwuxxqjbuze2bxe6xbwnysy

Line Segment Detection Using Transformers without Edges [article]

Yifan Xu, Weijian Xu, David Cheung, Zhuowen Tu
2021 arXiv   pre-print
We equip Transformers with a multi-scale encoder/decoder strategy to perform fine-grained line segment detection under a direct endpoint distance loss.  ...  This loss term is particularly suitable for detecting geometric structures such as line segments that are not conveniently represented by the standard bounding box representations.  ...  For each row, we show how a same line entity predicts line segments with same property in three different indoor/outdoor scenes.  ... 
arXiv:2101.01909v2 fatcat:keli33ufurdq7l2opoywlowjxa

Weakly Supervised Semantic Segmentation in 3D Graph-Structured Point Clouds of Wild Scenes [article]

Haiyan Wang, Xuejian Rong, Liang Yang, Jinglun Feng, Jizhong Xiao, Yingli Tian
2020 arXiv   pre-print
To alleviate this issue, we propose a novel deep graph convolutional network-based framework for large-scale semantic scene segmentation in point clouds with sole 2D supervision.  ...  The deficiency of 3D segmentation labels is one of the main obstacles to effective point cloud segmentation, especially for scenes in the wild with varieties of different objects.  ...  Then the following two Unets, CoarseNet and RefineNet, are used to generate the coarse-to-fine RGB images. Novel views can also be generated by taking the virtual tours for the total scene.  ... 
arXiv:2004.12498v2 fatcat:5mr6uuli6baixai7bjud24heda

SG-NN: Sparse Generative Neural Networks for Self-Supervised Scene Completion of RGB-D Scans

Angela Dai, Christian Diller, Matthias NieBner
2020 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)  
Key to our approach is its self-supervised formulation, enabling training solely on real-world, incomplete scans.  ...  This not only obviates the need for synthetic ground truth, but is also capable of generating more complete scenes than any single target scene seen during training.  ...  Our self-supervision approach using loss masking enables more complete scene prediction than direct supervision using the target RGB-D scan, particularly in regions where occlusions commonly occur.  ... 
doi:10.1109/cvpr42600.2020.00093 dblp:conf/cvpr/DaiDN20 fatcat:pmczslqzwng6vgcjik74r7dcne

The Reasonable Effectiveness of Synthetic Visual Data

Adrien Gaidon, Antonio Lopez, Florent Perronnin
2018 International Journal of Computer Vision  
Therefore, to tackle more challenging tasks, such as video scene understanding, progress is needed not only on the algorithmic and hardware fronts but also on the data front, both for learning and quantitative  ...  The recent successes in many visual recognition tasks, such as image classification, object detection, and semantic segmentation can be attributed in large part to three factors: (i) advances in end-to-end  ...  It is shown experimentally that the proposed coarse-to-fine approach can achieve millimetric relative localization without a single real-world training image. • In "3D interpreter networks for viewer-centered  ... 
doi:10.1007/s11263-018-1108-0 fatcat:53z6bysvyvgclmz3ywxohlxqca

3D Semantic Scene Completion: a Survey [article]

Luis Roldao, Raoul de Charette, Anne Verroust-Blondet
2021 arXiv   pre-print
Semantic Scene Completion (SSC) aims to jointly estimate the complete geometry and semantics of a scene, assuming partial sparse input.  ...  Specifically, SSC lies in the ambiguous completion of large unobserved areas and the weak supervision signal of the ground truth.  ...  Coarse-to-fine (b), similarly to multi-scale relies on multiple size predictions, but trains in a multi-stage coarse to fine manner.  ... 
arXiv:2103.07466v3 fatcat:swz4azlznre3laziatls6sdrfm

Gated Feedback Refinement Network for Coarse-to-Fine Dense Semantic Image Labeling [article]

Md Amirul Islam, Mrigank Rochan, Shujon Naha, Neil D. B. Bruce, and Yang Wang
2018 arXiv   pre-print
We first propose a network architecture called Label Refinement Network (LRN) that predicts segmentation labels in a coarse-to-fine fashion at several spatial resolutions.  ...  Initially, G-FRNet makes a coarse-grained prediction which it progressively refines to recover details by effectively integrating local and global contextual information during the refinement stages.  ...  Our model produces segmentation labels in a coarse-to-fine manner. The segmentation labels at coarse levels are used to progressively refine the labeling produced at finer levels.  ... 
arXiv:1806.11266v1 fatcat:nzfn3zaxvvbe7ajtxuztzb5a2y

Monocular spherical depth estimation with explicitly connected weak layout cues

Nikolaos Zioulis, Federico Alvarez, Dimitrios Zarpalas, Petros Daras
2022 ISPRS journal of photogrammetry and remote sensing (Print)  
Recently, with the availability of appropriate datasets, there has also been progress in depth estimation from a single omnidirectional image.  ...  Spherical cameras capture scenes in a holistic manner and have been used for room layout estimation.  ...  Multi-scale Supervision We supervise both tasks, with the estimated depth maps being supervised in three scales s ∈ {0, 1, 2} from coarse to fine, resulting in the following combined loss function: L =  ... 
doi:10.1016/j.isprsjprs.2021.10.016 fatcat:zoz6hrijybejtfzhmpkqcxksvu

Semantic Foggy Scene Understanding with Synthetic Data

Christos Sakaridis, Dengxin Dai, Luc Van Gool
2018 International Journal of Computer Vision  
For evaluation, we present Foggy Driving, a dataset with 101 real-world images depicting foggy driving scenes, which come with ground truth annotations for semantic segmentation and object detection.  ...  Extensive experiments show that 1) supervised learning with our synthetic data significantly improves the performance of state-of-the-art CNN for SFSU on Foggy Driving; 2) our semi-supervised learning  ...  Acknowledgements The authors would like to thank Kevis Maninis for useful discussions. This work is funded by Toyota Motor Europe via the research project TRACE-Zürich.  ... 
doi:10.1007/s11263-018-1072-8 fatcat:anr5t3jyfrewlkcszwor6a6zyq

Multi-scale iterative refinement network for RGB-D salient object detection

Ze-yu Liu, Jian-wei Liu, Xin Zuo, Ming-fei Hu
2021 Engineering applications of artificial intelligence  
In this paper, we begin by introducing top-down and bottom-up iterative refinement architecture to leverage multi-scale features, and then devise attention based fusion module (ABF) to address on cross-modal  ...  However, salient visual cues appear in various scales and resolutions of RGB images due to semantic gaps at different feature levels.  ...  We use iterative refinement to progressively refine the salient maps reinforced by coarseto-fine supervision.  ... 
doi:10.1016/j.engappai.2021.104473 fatcat:2dj6azts2fajlju7c55mnpw5sq

RIO: 3D Object Instance Re-Localization in Changing Indoor Environments [article]

Johanna Wald, Armen Avetisyan, Nassir Navab, Federico Tombari, Matthias Nießner
2019 arXiv   pre-print
Each scene includes several objects whose positions change over time, together with ground truth annotations of object instances and their respective 6DoF mappings among re-scans.  ...  We consider RIO a particularly important task in 3D vision since it enables a wide range of practical applications, including AI-assistants or robots that are asked to find a specific object in a 3D scene  ...  Acknowledgment We would like to thank the volunteers who helped with 3D scanning, all expert annotators, as well as Jürgen Sturm, Tom Funkhouser and Maciej Halber for fruitful discussions.  ... 
arXiv:1908.06109v1 fatcat:wzcalwtmffhjrmmbjwlc3v4ubi

Fast Point Voxel Convolution Neural Network with Selective Feature Fusion for Point Cloud Semantic Segmentation [article]

Xu Wang, Yuyan Li, Ye Duan
2021 arXiv   pre-print
For the point branch, we use Multi-Layer Perceptron (MLP) to extract fine-detailed point-wise features. Outputs from these two branches are adaptively fused via a feature selection module.  ...  We evaluate our method on popular point cloud datasets for object classification and semantic segmentation tasks.  ...  Indoor Scene Segmentation Data and Implementation Details We conduct experiments on S3DIS [1] for large-scale indoor scene segmentation.  ... 
arXiv:2109.11614v1 fatcat:lyw4o4qgrrgfhf4n2hiq35i7ky
« Previous Showing results 1 — 15 out of 1,341 results