Filters








4,344 Hits in 2.7 sec

Object-Centric Learning with Slot Attention [article]

Francesco Locatello, Dirk Weissenborn, Thomas Unterthiner, Aravindh Mahendran, Georg Heigold, Jakob Uszkoreit, Alexey Dosovitskiy, Thomas Kipf
2020 arXiv   pre-print
We empirically demonstrate that Slot Attention can extract object-centric representations that enable generalization to unseen compositions when trained on unsupervised object discovery and supervised  ...  Learning object-centric representations of complex scenes is a promising step towards enabling efficient abstract reasoning from low-level perceptual features.  ...  the paper, Mostafa Dehghani, Klaus Greff, Bernhard Schölkopf, Klaus-Robert Müller, Adam Kosiorek, and Peter Battaglia for helpful discussions, and Rishabh Kabra for advise regarding the DeepMind Multi-Object  ... 
arXiv:2006.15055v2 fatcat:2uvpbz754nftbnizjin6zeiyai

Learning Object-Centric Video Models by Contrasting Sets [article]

Sindy Löwe, Klaus Greff, Rico Jonschkowski, Alexey Dosovitskiy, Thomas Kipf
2020 arXiv   pre-print
Thus, this objective does not inherently push towards the emergence of object-centric representations in the slots.  ...  However, a fundamental problem with this approach is that the overall contrastive loss is the same for (i) representing a different object in each slot, as it is for (ii) (re-)representing the same object  ...  Object-centric learning with slot attention. arXiv preprint arXiv:2006.15055, 2020. [13] Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton.  ... 
arXiv:2011.10287v1 fatcat:ain2y5dwuvhxzmky4qjpzu6oke

Illiterate DALL-E Learns to Compose [article]

Gautam Singh, Fei Deng, Sungjin Ahn
2022 arXiv   pre-print
In contrast, object-centric representation models like the Slot Attention model learn composable representations without the text prompt.  ...  In this paper, we propose a simple but novel slot-based autoencoding architecture, called SLATE, for combining the best of both worlds: learning object-centric representations that allows systematic generalization  ...  PRELIMINARIES OBJECT-CENTRIC REPRESENTATION LEARNING WITH PIXEL-MIXTURE DECODER A common framework for learning object-centric representations is via a form of auto-encoders (Locatello et al., 2020;  ... 
arXiv:2110.11405v3 fatcat:skhslagryjgunopv65ymgpifjq

Inductive Biases for Object-Centric Representations in the Presence of Complex Textures [article]

Samuele Papa, Ole Winther, Andrea Dittadi
2022 arXiv   pre-print
Understanding which inductive biases could be helpful for the unsupervised learning of object-centric representations of natural scenes is challenging.  ...  We find that methods that use a single module to reconstruct both the shape and visual appearance of each object learn more useful representations and achieve better object separation.  ...  ., 2019) and Slot Attention (Locatello et al., 2020) , two popular and successful approaches for unsupervised object-centric learning.  ... 
arXiv:2204.08479v2 fatcat:fn57ewcum5cwlhr27taiyji3d4

Generalization and Robustness Implications in Object-Centric Learning [article]

Andrea Dittadi, Samuele Papa, Michele De Vita, Bernhard Schölkopf, Ole Winther, Francesco Locatello
2021 arXiv   pre-print
The idea behind object-centric representation learning is that natural scenes can better be modeled as compositions of objects and their relations as opposed to distributed representations.  ...  This inductive bias can be injected into neural networks to potentially improve systematic generalization and learning efficiency of downstream tasks in scenes with multiple objects.  ...  Object-centric learning with slot attention. arXiv preprint arXiv:2006.15055, 2020. [21] Eric Crawford and Joelle Pineau. Exploiting spatial invariance for scalable unsupervised object tracking.  ... 
arXiv:2107.00637v1 fatcat:l2bq3hjhvnbzxdybxfnyfwnqsa

Towards Self-Supervised Learning of Global and Object-Centric Representations [article]

Federico Baldassarre, Hossein Azizpour
2022 arXiv   pre-print
We discuss key aspects of learning structured object-centric representations with self-supervision and validate our insights through several experiments on the CLEVR dataset.  ...  Regarding the architecture, we confirm the importance of competition for attention-based object discovery, where each image patch is exclusively attended by one object.  ...  For object-centric learning, the representation function shall, in addition, output a set of vectors S = {s i } commonly termed slots or object tokens. Backbone.  ... 
arXiv:2203.05997v2 fatcat:bircpng5qzhyrjig2zkzi233mu

Unsupervised Discovery of Object Radiance Fields [article]

Hong-Xing Yu, Leonidas J. Guibas, Jiajun Wu
2022 arXiv   pre-print
We study the problem of inferring an object-centric scene representation from a single image, aiming to derive a representation that explains the image formation process, captures the scene's 3D nature  ...  Trained on multi-view RGB images without annotations, uORF learns to decompose complex scenes with diverse, textured background from a single image.  ...  Background-aware slot attention.  ... 
arXiv:2107.07905v2 fatcat:gpg5ofyc5jd25eex6omx3fquqa

Slot-VPS: Object-centric Representation Learning for Video Panoptic Segmentation [article]

Yi Zhou, Hui Zhang, Hana Lee, Shuyang Sun, Pingjun Li, Yangguang Zhu, ByungIn Yoo, Xiaojuan Qi, Jae-Joon Han
2021 arXiv   pre-print
In this paper, inspired by object-centric learning which learns compact and robust object representations, we present Slot-VPS, the first end-to-end framework for this task.  ...  We encode all panoptic entities in a video, including both foreground instances and background semantics, with a unified representation called panoptic slots.  ...  Object-centric learning.  ... 
arXiv:2112.08949v1 fatcat:ykmn47fxgbgpncqzqpz7htifry

Unsupervised Discovery and Composition of Object Light Fields [article]

Cameron Smith, Hong-Xing Yu, Sergey Zakharov, Fredo Durand, Joshua B. Tenenbaum, Jiajun Wu, Vincent Sitzmann
2022 arXiv   pre-print
Dubbed Compositional Object Light Fields (COLF), our method enables unsupervised learning of object-centric neural scene representations, state-of-the-art reconstruction and novel view synthesis performance  ...  Here, we propose to represent objects in an object-centric, compositional scene representation as light fields.  ...  [26] proposed Slot Attention as an inference model in such slot-based approaches.  ... 
arXiv:2205.03923v1 fatcat:t2xruywzrbcrdndbmwjwfe2v7a

Conditional Object-Centric Learning from Video [article]

Thomas Kipf, Gamaleldin F. Elsayed, Aravindh Mahendran, Austin Stone, Sara Sabour, Georg Heigold, Rico Jonschkowski, Alexey Dosovitskiy, Klaus Greff
2022 arXiv   pre-print
Recent work on simple 2D and 3D datasets has shown that models with object-centric inductive biases can learn to segment and represent meaningful objects from the statistical structure of the data alone  ...  We introduce a sequential extension to Slot Attention which we train to predict optical flow for realistic looking synthetic scenes and show that conditioning the initial state of this model on a small  ...  We are further grateful to Rishabh Kabra for sharing the CATER (with masks) dataset and to Yi Yang, Misha Denil, Yusuf Aytar, and Claudio Fantacci for helping us get started with the Sketchy dataset.  ... 
arXiv:2111.12594v2 fatcat:2msznvqbfbh6npuxje2ns23qbi

Unsupervised Image Decomposition with Phase-Correlation Networks [article]

Angel Villar-Corrales, Sven Behnke
2022 arXiv   pre-print
Recently, different methods have been proposed to learn object-centric representations from data in an unsupervised manner.  ...  of a set of learned object prototypes.  ...  A first approach to object-centric decomposition combines VAEs with attention mechanisms to decompose a scene into object-centric representations.  ... 
arXiv:2110.03473v3 fatcat:n6zalqjazbgingx2pi3itdgneq

GENESIS-V2: Inferring Unordered Object Representations without Iterative Refinement [article]

Martin Engelcke, Oiwi Parker Jones, Ingmar Posner
2022 arXiv   pre-print
Advances in unsupervised learning of object-representations have culminated in the development of a broad range of methods for unsupervised object segmentation and interpretable object-centric scene generation  ...  These methods, however, are limited to simulated and real-world datasets with limited visual complexity.  ...  in an interpretable, object-centric fashion.  ... 
arXiv:2104.09958v3 fatcat:vw27tcismrgiflk2u4afqe2bme

Benchmarking Unsupervised Object Representations for Video Sequences [article]

Marissa A. Weis, Kashyap Chitta, Yash Sharma, Wieland Brendel, Matthias Bethge, Andreas Geiger, Alexander S. Ecker
2021 arXiv   pre-print
Recently, several methods have been proposed for unsupervised learning of object-centric representations.  ...  more robust object-centric video representations.  ...  Co-Reyes, and Michael Chang particularly with regards to applying OP3 to our experimental setting.  ... 
arXiv:2006.07034v2 fatcat:2rakkw62zzganjbttsajtravxi

Structured World Belief for Reinforcement Learning in POMDP [article]

Gautam Singh, Skand Peri, Junghyun Kim, Hyunseok Kim, Sungjin Ahn
2021 arXiv   pre-print
To synergize the benefits of SMC particles with object representations, we also propose a new object-centric dynamics model that considers the inductive bias of object permanence.  ...  In this paper, we propose Structured World Belief, a model for learning and inference of object-centric belief states.  ...  Object-centric learning with slot attention, 2020.Ma, X., Karkus, P., Hsu, D., and Lee, W. S. Particle filter recurrent neural networks. AAAI, 2019.  ... 
arXiv:2107.08577v1 fatcat:7s7pwcmfubbzzk23dagdttcocy

APEX: Unsupervised, Object-Centric Scene Segmentation and Tracking for Robot Manipulation [article]

Yizhe Wu, Oiwi Parker Jones, Martin Engelcke, Ingmar Posner
2021 arXiv   pre-print
Recent advances in unsupervised learning for object detection, segmentation, and tracking hold significant promise for applications in robotics.  ...  We thus introduce the Panda Pushing Dataset (P2D) which shows a Panda arm interacting with objects on a table in simulation and which includes ground-truth segmentation masks and object IDs for tracking  ...  We demonstrate however, that SCALOR struggles to learn object-centric representations on datasets with objects of widely varying sizes and textures as encountered in robot manipulation.  ... 
arXiv:2105.14895v2 fatcat:swkwv5y55vfibjv3yrgsqg3vay
« Previous Showing results 1 — 15 out of 4,344 results