Filters








1,665 Hits in 5.2 sec

Learning Robust State Abstractions for Hidden-Parameter Block MDPs [article]

Amy Zhang, Shagun Sodhani, Khimya Khetarpal, Joelle Pineau
2021 arXiv   pre-print
Hidden-Parameter Markov Decision Processes (HiP-MDPs) explicitly model this structure to improve sample efficiency in multi-task settings.  ...  We derive instantiations of this new framework for both multi-task reinforcement learning (MTRL) and meta-reinforcement learning (Meta-RL) settings.  ...  Instead, we propose to study the multi-task reinforcement learning setting by framing it as a structured super-MDP with a shared state space and universal dynamics model controlled by a task-specific hidden  ... 
arXiv:2007.07206v4 fatcat:mdp3x6s6ovf5znhb2oiv56y5fy

RL^2: Fast Reinforcement Learning via Slow Reinforcement Learning [article]

Yan Duan, John Schulman, Xi Chen, Peter L. Bartlett, Ilya Sutskever, Pieter Abbeel
2016 arXiv   pre-print
Rather than designing a "fast" reinforcement learning algorithm, we propose to represent it as a recurrent neural network (RNN) and learn it from data.  ...  Deep reinforcement learning (deep RL) has been successful in learning sophisticated behaviors automatically; however, the learning process requires a huge number of trials.  ...  This research was funded in part by ONR through a PECASE award. Yan Duan was also supported by a Berkeley AI Research lab Fellowship and a Huawei Fellowship.  ... 
arXiv:1611.02779v2 fatcat:5uies6uzlnhwpdmjwx3ofnz4oq

A Simple Neural Attentive Meta-Learner [article]

Nikhil Mishra, Mostafa Rohaninejad, Xi Chen, Pieter Abbeel
2018 arXiv   pre-print
In response, recent work in meta-learning proposes training a meta-learner on a distribution of similar tasks, in the hopes of generalization to novel but related tasks by learning a high-level strategy  ...  On all tasks, in both supervised and reinforcement learning, SNAIL attains state-of-the-art performance by significant margins.  ...  D REINFORCEMENT LEARNING: ABLATIONS Here we conduct a few ablations on RL tasks: we explore whether an agent relying only on TC layers or only on attention layers can solve the multi-armed bandit or MDP  ... 
arXiv:1707.03141v3 fatcat:ign7zqjvunbtdly4eyoma7fzkq

Invariant Causal Prediction for Block MDPs [article]

Amy Zhang, Clare Lyle, Shagun Sodhani, Angelos Filos, Marta Kwiatkowska, Joelle Pineau, Yarin Gal, Doina Precup
2020 arXiv   pre-print
In this paper, we consider the problem of learning abstractions that generalize in block MDPs, families of environments with a shared latent state space and dynamics structure over that latent space, but  ...  Generalization across environments is critical to the successful application of reinforcement learning algorithms to real-world challenges.  ...  Multi-Task Reinforcement Learning Teh et al.  ... 
arXiv:2003.06016v2 fatcat:hnqf7cfkergp3fsaoi6lbsxa6u

Multi-Task Reinforcement Learning with Context-based Representations [article]

Shagun Sodhani, Amy Zhang, Joelle Pineau
2021 arXiv   pre-print
The benefit of multi-task learning over single-task learning relies on the ability to use relations across tasks to improve performance on any single task.  ...  While this metadata can be useful for improving multi-task learning performance, effectively incorporating it can be an additional challenge.  ...  ., 2020b) 3 , as a natural instantiation of a BC-MDP.  ... 
arXiv:2102.06177v2 fatcat:gmwlp2lwavhi7itxkok7dj6tdy

Generalized Hidden Parameter MDPs Transferable Model-based RL in a Handful of Trials [article]

Christian F. Perez, Felipe Petroski Such, Theofanis Karaletsos
2020 arXiv   pre-print
We propose Generalized Hidden Parameter MDPs (GHP-MDPs) that describe a family of MDPs where both dynamics and reward can change as a function of hidden parameters that vary across tasks.  ...  The GHP-MDP augments model-based RL with latent variables that capture these hidden parameters, facilitating transfer across tasks.  ...  Generalized Hidden Parameter MDPs We denote a set of tasks/MDPs with transition dynamics T η and rewards R η that are fully described by hidden parameters η as a Generalized Hidden Parameter MDP (GHP-MDP  ... 
arXiv:2002.03072v1 fatcat:wl3wavxahnfl7avjjin7d7e6oy

Generalized Hidden Parameter MDPs:Transferable Model-Based RL in a Handful of Trials

Christian Perez, Felipe Petroski Such, Theofanis Karaletsos
2020 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
We propose Generalized Hidden Parameter MDPs (GHP-MDPs) that describe a family of MDPs where both dynamics and reward can change as a function of hidden parameters that vary across tasks.  ...  The GHP-MDP augments model-based RL with latent variables that capture these hidden parameters, facilitating transfer across tasks.  ...  Generalized Hidden Parameter MDPs We denote a set of tasks/MDPs with transition dynamics T η and rewards R η that are fully described by hidden parameters η as a Generalized Hidden Parameter MDP (GHP-MDP  ... 
doi:10.1609/aaai.v34i04.5989 fatcat:iqzcfgtmindg3hnhqnrfhuszse

Robust and Efficient Transfer Learning with Hidden Parameter Markov Decision Processes

Taylor Killian, Samuel Daulton, George Konidaris, Finale Doshi-Velez
2017 Advances in Neural Information Processing Systems  
We introduce a new formulation of the Hidden Parameter Markov Decision Process (HiP-MDP), a framework for modeling families of related tasks using low-dimensional latent embeddings.  ...  Our new framework correctly models the joint uncertainty in the latent parameters and the state space.  ...  [3] assume it is given and the specific variations of the parameters are learned. Also related are multi-task approaches that train a single model for multiple tasks simultaneously [5, 7] .  ... 
pmid:31656388 pmcid:PMC6814194 fatcat:r53ctldyojbjfca2cz476zs6ay

Robust and Efficient Transfer Learning with Hidden-Parameter Markov Decision Processes [article]

Taylor Killian, Samuel Daulton, George Konidaris, Finale Doshi-Velez
2017 arXiv   pre-print
We introduce a new formulation of the Hidden Parameter Markov Decision Process (HiP-MDP), a framework for modeling families of related tasks using low-dimensional latent embeddings.  ...  Our new framework correctly models the joint uncertainty in the latent parameters and the state space.  ...  [3] assume it is given and the specific variations of the parameters are learned. Also related are multi-task approaches that train a single model for multiple tasks simultaneously [5, 7] .  ... 
arXiv:1706.06544v3 fatcat:ddfcbj7snffa5acbp2fgngxtha

Hidden Parameter Markov Decision Processes: A Semiparametric Regression Approach for Discovering Latent Task Parametrizations

Finale Doshi-Velez, George Konidaris
2016 IJCAI International Joint Conference on Artificial Intelligence  
We introduce the Hidden Parameter Markov Decision Process (HiP-MDP), a framework that parametrizes a family of related dynamical systems with a low-dimensional set of latent factors, and introduce a semiparametric  ...  We show that a learned HiP-MDP rapidly identifies the dynamics of new task instances in several settings, flexibly adapting to task variation.  ...  Thus, we could always learn to solve each HiP-MDP instance as its own distinct MDP. Second, the parameter vector θ b is fixed for the duration of the task, and thus the hidden state has no dynamics.  ... 
pmid:28603402 pmcid:PMC5466173 fatcat:2qn7zqtkzrepnpfhy5q3iwh3ee

Hidden Parameter Markov Decision Processes: A Semiparametric Regression Approach for Discovering Latent Task Parametrizations [article]

Finale Doshi-Velez, George Konidaris
2013 arXiv   pre-print
We introduce the Hidden Parameter Markov Decision Process (HiP-MDP), a framework that parametrizes a family of related dynamical systems with a low-dimensional set of latent factors, and introduce a semiparametric  ...  In the control setting, we show that a learned HiP-MDP rapidly identifies the dynamics of a new task instance, allowing an agent to flexibly adapt to task variations.  ...  Thus, we could always learn to solve each HiP-MDP instance as its own distinct MDP. Second, the parameter vector θ b is fixed for the duration of the task, and thus the hidden state has no dynamics.  ... 
arXiv:1308.3513v1 fatcat:ddzfwarcnnfjradolbqmg2bhqy

Learning Discrete State Abstractions With Deep Variational Inference [article]

Ondrej Biza, Robert Platt, Jan-Willem van de Meent, Lawson L. S. Wong
2021 arXiv   pre-print
Through this learned discrete abstract model, we can efficiently plan for unseen goals in a multi-goal Reinforcement Learning setting.  ...  We map these embeddings onto a discrete representation using an action-conditioned hidden Markov model, which is trained end-to-end with the neural network.  ...  The authors also thank members of the GRAIL and Helping Hands groups at Northeastern, as well as anonymous reviewers, for helpful comments on the manuscript.  ... 
arXiv:2003.04300v3 fatcat:d2nzzyye7rdxzkyrcywtm4rcea

Reinforcement Learning for Semantic Segmentation in Indoor Scenes [article]

Md. Alimoor Reza, Jana Kosecka
2016 arXiv   pre-print
Parameters for the independent binary segmentation models can be learned very efficiently, and the combination strategy---learned using reinforcement learning---can be set independently and can vary over  ...  We pursue these observations in developing a more modular and flexible approach to multi-class parsing of RGBD data based on learning strategies for combining independent binary object-vs-background segmentations  ...  In case the parameters of MDP are now known, the reinforcement learning can be used to learn the optimal policy.  ... 
arXiv:1606.01178v1 fatcat:i6f65u2wmza25l7gopag3i2myy

Deep Reinforcement Learning: An Overview [chapter]

Seyed Sajad Mousavi, Michael Schukat, Enda Howley
2017 Lecture Notes in Networks and Systems  
In recent years, a specific machine learning method called deep learning has gained huge attraction, as it has obtained astonishing results in broad applications such as pattern recognition, speech recognition  ...  This chapter reviews the recent advances in deep reinforcement learning with a focus on the most used deep architectures such as autoencoders, convolutional neural networks and recurrent neural networks  ...  However, learning the parameters (deep nets with many hidden layers are led to have millions of parameters to learn) in a deep architecture is a difficult optimization task which imposes very high computational  ... 
doi:10.1007/978-3-319-56991-8_32 fatcat:hyphl437obfqpfkphppc5bku3y

Shaping multi-agent systems with gradient reinforcement learning

Olivier Buffet, Alain Dutech, François Charpillet
2007 Autonomous Agents and Multi-Agent Systems  
A définir par la commande Ö ×ÙÑ ßººº ABSTRACT. An original Reinforcement Learning (RL) methodology is proposed for the design of multi-agent systems.  ...  To that end, we design simple reactive agents in a decentralized way as independent learners.  ...  a harder learning task.  ... 
doi:10.1007/s10458-006-9010-5 fatcat:7pdttndjyzhtrelvedbrh5y4o4
« Previous Showing results 1 — 15 out of 1,665 results