Filters








6,474 Hits in 5.5 sec

Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations [article]

Vincent Sitzmann, Michael Zollhöfer, Gordon Wetzstein
2020 arXiv   pre-print
We propose Scene Representation Networks (SRNs), a continuous, 3D-structure-aware scene representation that encodes both geometry and appearance.  ...  While geometric deep learning has explored 3D-structure-aware representations of scene geometry, these models typically require explicit 3D supervision.  ...  Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations -Supplementary Material-Vincent Sitzmann Michael Zollhöfer  ... 
arXiv:1906.01618v2 fatcat:mdywjeydm5asbacdtbs7wmgj5u

Semantic Implicit Neural Scene Representations With Semi-Supervised Training [article]

Amit Kohli, Vincent Sitzmann, Gordon Wetzstein
2021 arXiv   pre-print
The recent success of implicit neural scene representations has presented a viable new method for how we capture and store 3D scenes.  ...  We explore two novel applications for this semantically aware implicit neural scene representation: 3D novel view and semantic label synthesis given only a single input RGB image or 2D label mask, as well  ...  Scene Representation Networks Scene Representation Networks are a continuous, 3D-structure aware neural scene representation.  ... 
arXiv:2003.12673v2 fatcat:pcijezws6vhznmz4ih5buuytva

BACON: Band-limited Coordinate Networks for Multiscale Scene Representation [article]

David B. Lindell, Dave Van Veen, Jeong Joon Park, Gordon Wetzstein
2022 arXiv   pre-print
Coordinate-based networks have emerged as a powerful tool for 3D representation and scene reconstruction.  ...  We demonstrate BACON for multiscale neural representation of images, radiance fields, and 3D scenes using signed distance functions and show that it outperforms conventional single-scale coordinate networks  ...  Emerging neural scene representations promise 3D-structureaware, continuous, memory-efficient representations for parts [20, 21] , objects [3, 6, 13, 22, 42, 52, 78] , or scenes [15, 26, 55, 64, 66]  ... 
arXiv:2112.04645v2 fatcat:kp7nwouso5adlpo6hxkpqb56yq

In-Place Scene Labelling and Understanding with Implicit Scene Representation [article]

Shuaifeng Zhi, Tristan Laidlow, Stefan Leutenegger, Andrew J. Davison
2021 arXiv   pre-print
specific to the scene.  ...  We show the benefit of this approach when labels are either sparse or very noisy in room-scale scenes.  ...  Implicit 3D Representations There has been much promising recent work on using neural implicit scene representations.  ... 
arXiv:2103.15875v2 fatcat:dkjq7aafl5f2rbunnkwxbtn64q

GRF: Learning a General Radiance Field for 3D Scene Representation and Rendering [article]

Alex Trevithick, Bo Yang
2020 arXiv   pre-print
We present a simple yet powerful implicit neural function that can represent and render arbitrarily complex 3D scenes in a single network only from 2D observations.  ...  The function models 3D scenes as a general radiance field, which takes a set of posed 2D images as input, constructs an internal representation for each 3D point of the scene, and renders the corresponding  ...  Details of the neural rendering layers and the volume rendering can be found in NeRF (Mildenhall et al., 2020) .  ... 
arXiv:2010.04595v1 fatcat:peoapjsn7bgb3kfrqg7vbs5tei

Compositional Scene Representation Learning via Reconstruction: A Survey [article]

Jinyang Yuan, Tonglin Chen, Bin Li, Xiangyang Xue
2022 arXiv   pre-print
Because compositional scene representations abstract the concept of objects, performing visual scene analysis and understanding based on these representations could be easier and more interpretable.  ...  Visual scene representation learning is an important research problem in the field of computer vision.  ...  occlusion-aware online clustering.  ... 
arXiv:2202.07135v1 fatcat:dxowzvbhhrbujnhezpdln26adm

ROOTS: Object-Centric Representation and Rendering of 3D Scenes [article]

Chang Chen, Fei Deng, Sungjin Ahn
2021 arXiv   pre-print
Recent works achieve object-centric generation but without the ability to infer the representation, or achieve 3D scene representation learning but without object-centric compositionality.  ...  Therefore, learning to represent and render 3D scenes with object-centric compositionality remains elusive.  ...  ., 2020) learns object-aware 3D scene representations for generative adversarial networks (Goodfellow et al., 2014) .  ... 
arXiv:2006.06130v3 fatcat:xkoyvm7lknfr7d3erukozbv6gm

Learning Physical Graph Representations from Visual Scenes [article]

Daniel M. Bear, Chaofei Fan, Damian Mrowca, Yunzhu Li, Seth Alter, Aran Nayebi, Jeremy Schwartz, Li Fei-Fei, Jiajun Wu, Joshua B. Tenenbaum, Daniel L.K. Yamins
2020 arXiv   pre-print
Convolutional Neural Networks (CNNs) have proved exceptional at learning representations for visual object categorization.  ...  We also describe PSGNet, a network architecture that learns to extract PSGs by reconstructing scenes through a PSG-structured bottleneck.  ...  PSGNet builds upon and extends all these ideas to learn a hierarchical, 3D-aware representation without supervision of scene structure. Methods Physical Scene Graphs.  ... 
arXiv:2006.12373v2 fatcat:buvy3iaywjabfeytxrssjbwxte

STR-GQN: Scene Representation and Rendering for Unknown Cameras Based on Spatial Transformation Routing [article]

Wen-Cheng Chen, Min-Chun Hu, Chu-Song Chen
2021 arXiv   pre-print
Geometry-aware modules are widely applied in recent deep learning architectures for scene representation and rendering.  ...  The STR mechanism treats the spatial transformation as the message passing process, and the relation between the view poses and the routing weights is modeled by an end-to-end trainable neural network.  ...  Neural Scene Representation and Rendering Neural scene representation and rendering models learn implicit representations of the scenes by training an endto-end neural network to predict the images of  ... 
arXiv:2108.03072v1 fatcat:khrvkui5gvbwdkfz5dfvzeqita

Decoding representations of scenes in the medial temporal lobes

Heidi M. Bonnici, Dharshan Kumaran, Martin J. Chadwick, Nikolaus Weiskopf, Demis Hassabis, Eleanor A. Maguire
2011 Hippocampus  
We observed that while information that enabled two highly similar scenes to be distinguished was widely distributed throughout the MTL, more distinct scene representations were present in the hippocampus  ...  Our findings provide evidence for a specific computational role for the hippocampus in sustaining detailed representations of complex scenes, and shed new light on how the information processing capacities  ...  Structural MRI A whole brain 3D FLASH sequence was acquired with a resolution of 1 mm 3 1 mm 3 1 mm.  ... 
doi:10.1002/hipo.20960 pmid:21656874 pmcid:PMC3470919 fatcat:xt4xy7n5zfatfkauhqm2vatbey

CodeMapping: Real-Time Dense Mapping for Sparse SLAM using Compact Scene Representations [article]

Hidenobu Matsuki, Raluca Scona, Jan Czarnowski, Andrew J. Davison
2021 arXiv   pre-print
We propose a novel dense mapping framework for sparse visual SLAM systems which leverages a compact scene representation.  ...  Our dense mapper can be used not only for local mapping but also globally consistent dense 3D reconstruction through TSDF fusion.  ...  [10] applied a Graph Neural Network to depth completion. Cheng et al. [21] proposed a semantic scene completion network for LiDAR point clouds for a large scale outdoor environment.  ... 
arXiv:2107.08994v1 fatcat:suf6x3hqnjgfdbj7awyj6nwr2y

Towards Scene Understanding: Unsupervised Monocular Depth Estimation With Semantic-Aware Representation

Po-Yi Chen, Alexander H. Liu, Yen-Cheng Liu, Yu-Chiang Frank Wang
2019 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)  
Monocular depth estimation is a challenging task in scene understanding, with the goal to acquire the geometric properties of 3D space from 2D images.  ...  Moreover, our proposed model is able to perform region-aware depth estimation by enforcing semantics consistency between stereo pairs.  ...  [14] used a deep convolution neural network and continuous condition random field as patch-wise depth predictor to estimate the depth information. Eigen et al.  ... 
doi:10.1109/cvpr.2019.00273 dblp:conf/cvpr/ChenLLW19 fatcat:yxy47fjnnbchljkxlmge7fte3u

TMT: A Transformer-based Modal Translator for Improving Multimodal Sequence Representations in Audio Visual Scene-aware Dialog [article]

Wubo Li, Dongwei Jiang, Wei Zou, Xiangang Li
2020 arXiv   pre-print
Index Terms: multimodal learning, audio-visual scene-aware dialog, neural machine translation, multi-task learning  ...  Audio Visual Scene-aware Dialog (AVSD) is a task to generate responses when discussing about a given video.  ...  Conclusion In this paper, we propose a Transformer-based Modal Translator (TMT) to learn the representations of multimodal sequence for Audio Visual Scene-aware Dialog.  ... 
arXiv:2010.10839v1 fatcat:gut3m5hybvh2ddviasoz7h7pwm

TMT: A Transformer-Based Modal Translator for Improving Multimodal Sequence Representations in Audio Visual Scene-Aware Dialog

Wubo Li, Dongwei Jiang, Wei Zou, Xiangang Li
2020 Interspeech 2020  
Audio Visual Scene-aware Dialog (AVSD) is a task to generate responses when discussing about a given video.  ...  Inspired by Neural Machine Translation (NMT), we propose the Transformer-based Modal Translator (TMT) to learn the representations of the source modal sequence by translating the source modal sequence  ...  Conclusion In this paper, we propose a Transformer-based Modal Translator (TMT) to learn the representations of multimodal sequence for Audio Visual Scene-aware Dialog.  ... 
doi:10.21437/interspeech.2020-2359 dblp:conf/interspeech/LiJZL20 fatcat:o7yntqz3obe5bagebmirdhoyrm

Dynamic Graph Representation Learning for Video Dialog via Multi-Modal Shuffled Transformers [article]

Shijie Geng, Peng Gao, Moitreya Chatterjee, Chiori Hori, Jonathan Le Roux, Yongfeng Zhang, Hongsheng Li, Anoop Cherian
2021 arXiv   pre-print
Given an input video, its associated audio, and a brief caption, the audio-visual scene aware dialog (AVSD) task requires an agent to indulge in a question-answer dialog with a human about the audio-visual  ...  To encode fine-grained visual information, we present a novel dynamic scene graph representation learning pipeline that consists of an intra-frame reasoning layer producing spatio-semantic graph representations  ...  Previous approaches to this problem used holistic video features produced by a generic 3D convolutional neural network (Carreira and Zisserman 2017), and either focused on extending attention models on  ... 
arXiv:2007.03848v2 fatcat:bqvz6lk3szfv7frgcq4fvfz2ji
« Previous Showing results 1 — 15 out of 6,474 results