Filters








58,160 Hits in 5.5 sec

Multi-Object Representation Learning with Iterative Variational Inference [article]

Klaus Greff, Raphaël Lopez Kaufman, Rishabh Kabra, Nick Watters, Chris Burgess, Daniel Zoran, Loic Matthey, Matthew Botvinick, Alexander Lerchner
2020 arXiv   pre-print
We also show that, due to the use of iterative variational inference, our system is able to learn multi-modal posteriors for ambiguous inputs and extends naturally to sequences.  ...  Our method learns -- without supervision -- to inpaint occluded parts, and extrapolates to scenes with more objects and to unseen objects with novel feature combinations.  ...  Projections of Object Latents Input Ablations Variational Iterative Multi-Object Representation Learning Multi-dSprites dataset. Odd rows: image and object masks as determined by the model.  ... 
arXiv:1903.00450v3 fatcat:f7n2lww5xndtlb3kdt5n7ed7rm

Unsupervised Video Decomposition using Spatio-temporal Iterative Inference [article]

Polina Zablotskaia, Edoardo A. Dominici, Leonid Sigal, Andreas M. Lehrmann
2020 arXiv   pre-print
Unsupervised multi-object scene decomposition is a fast-emerging problem in representation learning.  ...  We propose a novel spatio-temporal iterative inference framework that is powerful enough to jointly model complex multi-object representations and explicit temporal dependencies between latent variables  ...  Our approach builds upon a generative model of multi-object representations [17] and leverages elements of iterative amortized inference [32] .  ... 
arXiv:2006.14727v1 fatcat:pvk3z4jqe5fkzogaaf7o73pz6i

Learning Object-Centric Representations of Multi-Object Scenes from Multiple Views

Nanbo Li, Cian Eastwood, Robert B. Fisher
2020 Neural Information Processing Systems  
To address this, we propose The Multi-View and Multi-Object Network (MulMON) 1 -a method for learning accurate, object-centric representations of multi-object scenes by leveraging multiple views.  ...  In order to sidestep the main technical difficulty of the multi-object-multi-view scenario-maintaining object correspondences across views-MulMON iteratively updates the latent object representations for  ...  Broader Impact In this paper, we presented a new method to learn object-centric representations of multi-object scenes.  ... 
dblp:conf/nips/LiEF20 fatcat:tsh6xlrpjvfkdbypikcipt4nhu

LAVAE: Disentangling Location and Appearance [article]

Andrea Dittadi, Ole Winther
2019 arXiv   pre-print
We use amortized variational inference to train the generative model end-to-end.  ...  The learned representations of object location and appearance are fully disentangled, and objects are represented independently of each other in the latent space.  ...  On multi-MNIST and multi-dSprites data sets, LAVAE learns without supervision to correctly count and locate all objects in a scene.  ... 
arXiv:1909.11813v2 fatcat:vuvmmisusjbsbkemtsrx2xccnm

Learning Object-Centric Representations of Multi-Object Scenes from Multiple Views [article]

Li Nanbo, Cian Eastwood, Robert B. Fisher
2021 arXiv   pre-print
To address this, we propose The Multi-View and Multi-Object Network (MulMON) -- a method for learning accurate, object-centric representations of multi-object scenes by leveraging multiple views.  ...  In order to sidestep the main technical difficulty of the multi-object-multi-view scenario -- maintaining object correspondences across views -- MulMON iteratively updates the latent object representations  ...  Using a spatial mixture model [10] and iterative amortized inference [19] , MulMON sidesteps the main technical difficulty of the multi-object-multi-view scenario-maintaining object correspondence across  ... 
arXiv:2111.07117v1 fatcat:7ekqkf33c5gp7g5m36tnvtpmtu

Multi-Prediction Deep Boltzmann Machines

Ian J. Goodfellow, Mehdi Mirza, Aaron C. Courville, Yoshua Bengio
2013 Neural Information Processing Systems  
solve different inference problems.  ...  We introduce the multi-prediction deep Boltzmann machine (MP-DBM).  ...  The choice of this type of variational learning combined with the underlying generalized pseudolikelihood objective makes an MP-DBM very well suited for solving approximate inference problems but not very  ... 
dblp:conf/nips/GoodfellowMCB13 fatcat:yg55kjaoordhpgymg7conlgdiu

Duplicate Latent Representation Suppression for Multi-object Variational Autoencoders

Nanbo Li, Robert B. Fisher
2021 British Machine Vision Conference  
Built upon variational autoencoders (VAEs) [11] , current approaches infer a set of latent object representations to interpret a scene observation (e.g. an image) under the assumption that each part (e.g  ...  Generative object-centric scene representation learning is crucial for structural visual scene understanding.  ...  image segmentation and object-based representation learning.  ... 
dblp:conf/bmvc/LiF21 fatcat:bdfl5acntzalzaurbacv76biui

Improved Multimodal Deep Learning with Variation of Information

Kihyuk Sohn, Wenling Shang, Honglak Lee
2014 Neural Information Processing Systems  
Rather than learning with maximum likelihood, we train the model to minimize the variation of information.  ...  Deep learning has been successfully applied to multimodal representation learning problems, with a common strategy to learning joint representations that are shared across multiple modalities on top of  ...  The minimum variation of information objective enables to learn a good shared representations of multiple heterogeneous data modalities with a better prediction of missing input modality.  ... 
dblp:conf/nips/SohnSL14 fatcat:tr2ms3poszbuvchdwpeqilfvsu

Associate Latent Encodings in Learning from Demonstrations

Hang Yin, Francisco Melo, Aude Billard, Ana Paiva
2017 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
Both latent representations and associations of different modalities are proposed to be jointly learned through an adapted variational auto-encoder.  ...  The advantages of learning associative latent encodings are further highlighted with the examples of inferring upon incomplete input images.  ...  Acknowledgments This work is partially funded by Swiss National Center of Robotics Research and national funds through Fundac ¸ão para a Ciência e a Tecnologia (FCT) with reference UID/ CEC/50021/2013  ... 
doi:10.1609/aaai.v31i1.11040 fatcat:qqlx35hoezg5dlabxmcc7rqxiu

CodeNeRF: Disentangled Neural Radiance Fields for Object Categories [article]

Wonbong Jang, Lourdes Agapito
2021 arXiv   pre-print
CodeNeRF is an implicit 3D neural representation that learns the variation of object shapes and textures across a category and can be trained, from a set of posed images, to synthesize novel views of unseen  ...  We conduct experiments on the SRN benchmark, which show that CodeNeRF generalises well to unseen objects and achieves on-par performance with methods that require known camera pose at test time.  ...  Neural representations have also been used to learn deformation priors that encode the variation of object shapes across semantic categories using direct 3D supervision [7, 2, 17, 14] .  ... 
arXiv:2109.01750v1 fatcat:wllnvifnrbai3e65eqjc72uzzu

Deep Regression Bayesian Network and Its Applications [article]

Siqi Nie, Meng Zheng, Qiang Ji
2017 arXiv   pre-print
The major difficulty of learning and inference with deep directed models with many latent variables is the intractable inference due to the dependencies among the latent variables and the exponential number  ...  In this paper, we review different structures of deep directed generative models and the learning and inference algorithms associated with the structures.  ...  By exploiting its multi-level representation and the availability of the big data, deep learning has led to dramatic performance improvements for certain tasks.  ... 
arXiv:1710.04809v1 fatcat:bwlvhlwdbndnvjjvfdsw4mfnz4

Modeling Artistic Workflows for Image Generation and Editing [article]

Hung-Yu Tseng, Matthew Fisher, Jingwan Lu, Yijun Li, Vladimir Kim, Ming-Hsuan Yang
2020 arXiv   pre-print
Furthermore, for the editing scenario, we introduce an optimization process along with learning-based regularization to ensure the edited image produced by the model closely aligns with the originally  ...  Motivated by the above observations, we propose a generative model that follows a given artistic workflow, enabling both multi-stage image generation as well as multi-stage image editing of an existing  ...  We first train each network separately with 450, 000 iterations, then jointly train all the networks in the workflow inference module with 450, 000 iterations. Artwork generation.  ... 
arXiv:2007.07238v1 fatcat:fll4lgz5qjdondnayo2ixrmck4

Planning Multi-Fingered Grasps as Probabilistic Inference in a Learned Deep Network [article]

Qingkai Lu, Kautilya Chenna, Balakumar Sundaralingam, Tucker Hermans
2018 arXiv   pre-print
We propose a novel approach to multi-fingered grasp planning leveraging learned deep neural network models.  ...  We train a convolutional neural network to predict grasp success as a function of both visual information of an object and grasp configuration.  ...  The same training set up is used for the patches-CNN, except we train for 60, 000 iterations with the learning rate decreasing by 10x every 20, 000 iterations.  ... 
arXiv:1804.03289v1 fatcat:trqps4vy25bj7j23pv4z7zan2i

Graphite: Iterative Generative Modeling of Graphs [article]

Aditya Grover, Aaron Zweig, Stefano Ermon
2019 arXiv   pre-print
Our model parameterizes variational autoencoders (VAE) with graph neural networks, and uses a novel iterative graph refinement strategy inspired by low-rank approximations for decoding.  ...  Finally, we derive a theoretical connection between message passing in graph neural networks and mean-field variational inference.  ...  Acknowledgements This research has been supported by Siemens, a Future of Life Institute grant, NSF grants (#1651565, #1522054, #1733686), ONR (N00014-19-1-2145), AFOSR (FA9550-19-1-0024), and an Amazon AWS Machine Learning  ... 
arXiv:1803.10459v4 fatcat:dl6rnkwnrnfova3jzu24psqzie

Knowledge-Guided Object Discovery with Acquired Deep Impressions [article]

Jinyang Yuan, Bin Li, Xiangyang Xue
2021 arXiv   pre-print
In this framework, the model first acquires knowledge from scene images containing a single object in a supervised manner, and then continues to learn from novel multi-object scene images which may contain  ...  By memorizing impressions of objects into parameters of neural networks and applying the generative replay strategy, the learned knowledge can be reused to generate images with pseudo-annotations and in  ...  2017SHZDZX01, 2018SHZDZX01), Shanghai Research and Innovation Functional Program (17DZ2260900), and the Program for Professor of Special Appointment (Eastern Scholar) at Shanghai Institutions of Higher Learning  ... 
arXiv:2103.10611v1 fatcat:vduyz6ibonatngp3stdkyo7dqy
« Previous Showing results 1 — 15 out of 58,160 results