13 Hits in 1.9 sec

The Eigenoption-Critic Framework [article]

Miao Liu, Marlos C. Machado, Gerald Tesauro, Murray Campbell
2017 arXiv   pre-print
To addresses these issues, we introduce an algorithm termed eigenoption-critic (EOC) based on the Option-critic (OC) framework [Bacon17], a general hierarchical reinforcement learning (RL) algorithm that  ...  Eigenoptions (EOs) have been recently introduced as a promising idea for generating a diverse set of options through the graph Laplacian, having been shown to allow efficient exploration.  ...  Generalization of the Eigenoption-Critic Framework to Continuous Domains In this section, we discuss the extension of the Eigenoption-critic framework to problems with continuous state-spaces.  ... 
arXiv:1712.04065v1 fatcat:slkplzgaczeyljtp532plykeiq

Temporal Abstraction in Reinforcement Learning with the Successor Representation [article]

Marlos C. Machado and Andre Barreto and Doina Precup
2021 arXiv   pre-print
The results of our experiments shed light on design decisions involved in the definition of options and demonstrate the synergy of different methods based on the SR, such as eigenoptions and the option  ...  Nevertheless, approaches based on the options framework often start with the assumption that a reasonable set of options is known beforehand.  ...  The authors would like to thank Tom Schaul and Adam White for their thorough feedback on an earlier draft; and Dale Schuurmans, Yuu Jinnai, Marc G.  ... 
arXiv:2110.05740v1 fatcat:zrguhnljlvbyhlvqlir4cpfrx4

Learning Reusable Options for Multi-Task Reinforcement Learning [article]

Francisco M. Garcia, Chris Nota, Philip S. Thomas
2020 arXiv   pre-print
In this paper, we propose a framework for exploiting existing experience by learning reusable options.  ...  Although there are many algorithms that allow an agent to solve tasks efficiently, they often ignore the possibility that prior experience related to the task at hand might be available.  ...  using KL divergence), eigenoptions, and option critic.  ... 
arXiv:2001.01577v1 fatcat:bxjlx72k7fbufd757dsaxuyjse

Finding Options that Minimize Planning Time [article]

Yuu Jinnai, David Abel, D Ellis Hershkowitz, Michael Littman, George Konidaris
2019 arXiv   pre-print
We first show that the problem is NP-hard, even if the task is constrained to be deterministic---the first such complexity result for option discovery.  ...  We formalize the problem of selecting the optimal set of options for planning as that of computing the smallest set of options so that planning converges in less than a given maximum of value-iteration  ...  Acknowledgments We would like to thank the anonymous reviewer for their advice and suggestions to improve the inapproximability result for MOMI.  ... 
arXiv:1810.07311v3 fatcat:vqjfhncy2fb2rjhoizdhhdqray

Efficient Exploration in Reinforcement Learning through Time-Based Representations

Marlos Cholodovskis Machado
The actions taken by the agent influence not just the immediate reward it observes but also the future states and rewards it will observe, implicitly requiring the agent to deal with the trade-off between  ...  In this context, the problem of exploration is the 1 The Devil to Pay in the Backlands, 1963. Translated from Portuguese by James L. Taylor and Harriet de Onís. viii  ...  "The Eigenoption-Critic Framework". In: CoRR abs/1712.04065. Presented at the NIPS-17 Hierarchical RL Workshop. [10] Marlos C. Machado, Marc G.  ... 
doi:10.7939/r3-zaq3-vs36 fatcat:ncn5ppb4kbbbrmulupi6x34cpa

The Laplacian in RL: Learning Representations with Efficient Approximations [article]

Yifan Wu, George Tucker, Ofir Nachum
2018 arXiv   pre-print
In reinforcement learning (RL), where the weighted graph may be interpreted as the state transition process induced by a behavior policy acting on the environment, approximating the eigenvectors of the  ...  The smallest eigenvectors of the graph Laplacian are well-known to provide a succinct representation of the geometry of a weighted graph.  ...  Given the d eigenoptions, an embedding can be obtained by letting f i (s) = ψ(s) T e i .  ... 
arXiv:1810.04586v1 fatcat:qrxu5dwe6fcb5maemnkje72hmu

Performance Optimization Strategies for Transactional Memory Applications

Martin Otto Schindewolf
the optimization of TM applications.  ...  This thesis presents tools for Transactional Memory (TM) applications that cover multiple TM systems (Software, Hardware, and hybrid TM) and use information of all different layers of the TM software stack  ...  To speedup this process, we invent EigenOpt -an exploration tool based on EigenBench -as part of the VisOTMA framework.  ... 
doi:10.5445/ir/1000035636 fatcat:waxykj7aijfojinkttzryrfmfy

Variational Option Discovery Algorithms [article]

Joshua Achiam, Harrison Edwards, Dario Amodei, Pieter Abbeel
2018 arXiv   pre-print
In VALOR, the policy encodes contexts from a noise distribution into trajectories, and the decoder recovers the contexts from the complete trajectories.  ...  Second: we propose a curriculum learning approach where the number of contexts seen by the agent increases whenever the agent's performance is strong enough (as measured by the decoder) on the current  ...  RL algorithms (soft actor-critic vs. entropy-regularized policy gradients).  ... 
arXiv:1807.10299v1 fatcat:6lu2wi4mszbkhjfly54olcwwxa

A Survey of Exploration Methods in Reinforcement Learning [article]

Susan Amin, Maziar Gomrokchi, Harsh Satija, Herke van Hoof, Doina Precup
2021 arXiv   pre-print
Reinforcement learning agents depend crucially on exploration to obtain informative data for the learning process as the lack of enough information could hinder effective learning.  ...  Optimal Probe (Duff, 2003) retains the full Bayesian framework but proposes to sidestep the intractable calculations by using a novel actor-critic architecture and proposing a corresponding policy-gradient  ...  flow (DIF) model, and subsequently, the eigenpurposes and eigenoptions are obtained.  ... 
arXiv:2109.00157v2 fatcat:dlqhzwxscnfbxpt2i6rp7ovp6i

Direct then Diffuse: Incremental Unsupervised Skill Discovery for State Covering and Goal Reaching [article]

Pierre-Alexandre Kamienny, Jean Tarbouriech, Alessandro Lazaric, Ludovic Denoyer
2021 arXiv   pre-print
In this paper, we build on the mutual information framework for skill discovery and introduce UPSIDE, which addresses the coverage-directedness trade-off in the following ways: 1) We design policies with  ...  of the environment.  ...  The actor and the critic have the same architecture that processes observations with a two-hidden layers (of size 64 for maze environments and 256 for control environments) neural networks.  ... 
arXiv:2110.14457v1 fatcat:lwnhvhafz5dlnfumyx4gmmwyaq

On The Effect of Auxiliary Tasks on Representation Dynamics [article]

Clare Lyle, Mark Rowland, Georg Ostrovski, Will Dabney
2021 arXiv   pre-print
Through this approach, we establish a connection between the spectral decomposition of the transition operator and the representations induced by a variety of auxiliary tasks.  ...  While auxiliary tasks play a key role in shaping the representations learnt by reinforcement learning agents, much is still unknown about the mechanisms through which this is achieved.  ...  We also thank the anonymous reviewers for useful comments during the review process. CL is funded by an Open Phil AI Fellowship.  ... 
arXiv:2102.13089v1 fatcat:mkmlajzdpze77hdn4hqgxnvfdy

Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning [article]

David Janz, Jiri Hron, Przemysław Mazur, Katja Hofmann, José Miguel Hernández-Lobato, Sebastian Tschiatschek
2019 arXiv   pre-print
However, we show that most contemporary algorithms combining RVF with neural network function approximation do not possess the properties which make PSRL effective, and provably fail in sparse reward problems  ...  Furthermore, on the Atari 2600 domain, it surpasses human performance on 38 of 49 games tested (achieving a median human normalised score of 2.09), and outperforms its closest RVF competitor, Bootstrapped  ...  Acknowledgements We thank Matej Balog and the anonymous reviewers for their helpful comments and suggestions. Jiri Hron acknowledges support by a Nokia CASE Studentship.  ... 
arXiv:1810.06530v5 fatcat:v5hr3hie5ffkfpoctnc3qrc4eq

A Deep Reinforcement Learning Approach to Marginalized Importance Sampling with the Successor Representation [article]

Scott Fujimoto, David Meger, Doina Precup
2021 arXiv   pre-print
The successor representation can be trained through deep reinforcement learning methodology and decouples the reward optimization from the dynamics of the environment, making the resulting algorithm stable  ...  We bridge the gap between MIS and deep reinforcement learning by observing that the density ratio can be computed from the successor representation of the target policy.  ...  Acknowledgements Scott Fujimoto is supported by a NSERC scholarship as well as the Borealis AI Global Fellowship Award.  ... 
arXiv:2106.06854v1 fatcat:4z7eaqfh6rfkjao3z2qip34n4i