11 Hits in 1.6 sec

Eigenoption Discovery through the Deep Successor Representation [article]

Marlos C. Machado, Clemens Rosenbaum, Xiaoxiao Guo, Miao Liu, Gerald Tesauro, Murray Campbell
2018 arXiv   pre-print
It exploits recent successes in the deep reinforcement learning literature and the equivalence between proto-value functions and the successor representation.  ...  In this paper we focus on the recently introduced idea of using representation learning methods to guide the option discovery process.  ...  Bellemare and Michael Bowling for useful discussions, and the anonymous reviewers for their feedback and suggestions.  ... 
arXiv:1710.11089v3 fatcat:2z2d4wsrm5c5vacbsctfnypytm

Temporal Abstraction in Reinforcement Learning with the Successor Representation [article]

Marlos C. Machado and Andre Barreto and Doina Precup
2021 arXiv   pre-print
In this paper, we argue that the successor representation (SR), which encodes states based on the pattern of state visitation that follows them, can be seen as a natural substrate for the discovery and  ...  We cast these results as instantiations of a general framework for option discovery in which the agent's representation is used to identify useful options, which are then used to further improve its representation  ...  The authors would like to thank Tom Schaul and Adam White for their thorough feedback on an earlier draft; and Dale Schuurmans, Yuu Jinnai, Marc G.  ... 
arXiv:2110.05740v1 fatcat:zrguhnljlvbyhlvqlir4cpfrx4

Option Discovery in the Absence of Rewards with Manifold Analysis [article]

Amitay Bar, Ronen Talmon, Ron Meir
2020 arXiv   pre-print
As opposed to the common practice used in previous methods, our algorithm makes full use of the spectrum of the graph Laplacian.  ...  Incorporating modes associated with higher graph frequencies unravels domain subtleties, which are shown to be useful for option discovery.  ...  The work of RM is partially supported by the Ollendorff Center of the Viterbi Faculty of Electrical Engineering at the Technion, and by the Skillman chair in biomedical sciences.  ... 
arXiv:2003.05878v2 fatcat:g2sgedwvyrgkpgbefus7jtyjsm

A Deep Reinforcement Learning Approach to Marginalized Importance Sampling with the Successor Representation [article]

Scott Fujimoto, David Meger, Doina Precup
2021 arXiv   pre-print
The successor representation can be trained through deep reinforcement learning methodology and decouples the reward optimization from the dynamics of the environment, making the resulting algorithm stable  ...  We bridge the gap between MIS and deep reinforcement learning by observing that the density ratio can be computed from the successor representation of the target policy.  ...  Acknowledgements Scott Fujimoto is supported by a NSERC scholarship as well as the Borealis AI Global Fellowship Award.  ... 
arXiv:2106.06854v1 fatcat:4z7eaqfh6rfkjao3z2qip34n4i

PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning [article]

Angelos Filos, Clare Lyle, Yarin Gal, Sergey Levine, Natasha Jaques, Gregory Farquhar
2021 arXiv   pre-print
To effectively use this data, we turn to the framework of successor features. This allows us to disentangle shared features and dynamics of the environment from agent-specific rewards and policies.  ...  We study reinforcement learning (RL) with no-reward demonstrations, a setting in which an RL agent has access to additional data from the interaction of other agents with the same environment.  ...  We also thank the anonymous reviewers for useful comments during the review process. A.F. is funded by a J.P.Morgan PhD Fellowship and C.L. is funded by an Open Phil AI Fellowship.  ... 
arXiv:2102.12560v2 fatcat:iihuwyxiyvduvgc7em5hahxway

Efficient Exploration in Reinforcement Learning through Time-Based Representations

Marlos Cholodovskis Machado
In the reinforcement learning (RL) problem an agent must learn how to act optimally through trial-and-error interactions with a complex, unknown, stochastic environment.  ...  The actions taken by the agent influence not just the immediate reward it observes but also the future states and rewards it will observe, implicitly requiring the agent to deal with the trade-off between  ...  "Eigenoption Discovery through the Deep Successor Representation". In: Proceedings of the International Conference on Learning Representations (ICLR). [3] Marlos C. .  ... 
doi:10.7939/r3-zaq3-vs36 fatcat:ncn5ppb4kbbbrmulupi6x34cpa

Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning [article]

David Janz, Jiri Hron, Przemysław Mazur, Katja Hofmann, José Miguel Hernández-Lobato, Sebastian Tschiatschek
2019 arXiv   pre-print
We use these insights to design Successor Uncertainties (SU), a cheap and easy to implement RVF algorithm that retains key properties of PSRL.  ...  However, we show that most contemporary algorithms combining RVF with neural network function approximation do not possess the properties which make PSRL effective, and provably fail in sparse reward problems  ...  Acknowledgements We thank Matej Balog and the anonymous reviewers for their helpful comments and suggestions. Jiri Hron acknowledges support by a Nokia CASE Studentship.  ... 
arXiv:1810.06530v5 fatcat:v5hr3hie5ffkfpoctnc3qrc4eq

Successor Feature Landmarks for Long-Horizon Goal-Conditioned Reinforcement Learning [article]

Christopher Hoang, Sungryull Sohn, Jongwook Choi, Wilka Carvalho, Honglak Lee
2021 arXiv   pre-print
SFL leverages the ability of successor features (SF) to capture transition dynamics, using it to drive exploration by estimating state-novelty and to enable high-level planning by abstracting the state-space  ...  In this work, we introduce Successor Feature Landmarks (SFL), a framework for exploring large, high-dimensional environments so as to obtain a policy that is proficient for any goal.  ...  Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the funding agencies.  ... 
arXiv:2111.09858v1 fatcat:opnzhm5bl5h6dc7utpfcoo7dku

An open-ended learning architecture to face the REAL 2020 simulated robot competition [article]

Emilio Cartoni
2020 arXiv   pre-print
The architecture represents the first model to solve the simpler REAL 2020 'Round 1' allowing the use of a simple parameterised push action.  ...  The multiple challenges posed by open-ended learning have been operationalized in the robotic competition REAL 2020.  ...  states can be successors of another state; from the CIGAN representation a plan is obtained Figure 1 : 1 (a) The environment with the robot, the table with the shelf, and the three objects.  ... 
arXiv:2011.13880v1 fatcat:lkzbjdnf4re3lf54kqjjmgak4u

On The Effect of Auxiliary Tasks on Representation Dynamics [article]

Clare Lyle, Mark Rowland, Georg Ostrovski, Will Dabney
2021 arXiv   pre-print
Through this approach, we establish a connection between the spectral decomposition of the transition operator and the representations induced by a variety of auxiliary tasks.  ...  While auxiliary tasks play a key role in shaping the representations learnt by reinforcement learning agents, much is still unknown about the mechanisms through which this is achieved.  ...  We also thank the anonymous reviewers for useful comments during the review process. CL is funded by an Open Phil AI Fellowship.  ... 
arXiv:2102.13089v1 fatcat:mkmlajzdpze77hdn4hqgxnvfdy

A Survey of Exploration Methods in Reinforcement Learning [article]

Susan Amin, Maziar Gomrokchi, Harsh Satija, Herke van Hoof, Doina Precup
2021 arXiv   pre-print
Reinforcement learning agents depend crucially on exploration to obtain informative data for the learning process as the lack of enough information could hinder effective learning.  ...  Later work by Machado et al. (2018b) proposed an improved version of eigenoption discovery, extending it to stochastic environments with nontabular states.  ...  encourage deep exploration through adopting optimistic value function.  ... 
arXiv:2109.00157v2 fatcat:dlqhzwxscnfbxpt2i6rp7ovp6i