815 Hits in 5.1 sec

Information asymmetry in KL-regularized RL [article]

Alexandre Galashov, Siddhant M. Jayakumar, Leonard Hasenclever, Dhruva Tirumala, Jonathan Schwarz, Guillaume Desjardins, Wojciech M. Czarnecki, Yee Whye Teh, Razvan Pascanu, Nicolas Heess
2019 arXiv   pre-print
In this work we study the possibility of leveraging such repeated structure to speed up and regularize learning.  ...  We start from the KL regularized expected reward objective which introduces an additional component, a default policy. Instead of relying on a fixed default policy, we learn it from data.  ...  A KL-REGULARIZED RL AND INFORMATION BOTTLENECK In this appendix we derive the connection between KL-regularized RL and information bottleneck in detail.  ... 
arXiv:1905.01240v1 fatcat:k67zvbbttrcgrb2kiuidplentq

Exploiting Hierarchy for Learning and Transfer in KL-regularized RL [article]

Dhruva Tirumala, Hyeonwoo Noh, Alexandre Galashov, Leonard Hasenclever, Arun Ahuja, Greg Wayne, Razvan Pascanu, Yee Whye Teh, Nicolas Heess
2020 arXiv   pre-print
The KL-regularized expected reward objective constitutes one possible tool to this end.  ...  In this work we consider the implications of this framework in cases where both the policy and default behavior are augmented with latent variables.  ...  Information asymmetry in KL-regularized RL. In International Conference on Learning Representations, 2019.Ghosh, D., Singh, A., Rajeswaran, A., Kumar, V., and Levine, S.  ... 
arXiv:1903.07438v2 fatcat:w5wazaicjbg3nobuxqnyna2tt4

Robot Learning of Mobile Manipulation with Reachability Behavior Priors [article]

Snehal Jauhri, Jan Peters, Georgia Chalvatzaki
2022 arXiv   pre-print
In this work, we study the integration of robotic reachability priors in actor-critic RL methods for accelerating the learning of MM for reaching and fetching tasks.  ...  Moreover, we find that regularizing the target policy with a prior policy yields more expressive behaviors.  ...  We found that applying forward KL regularization is more beneficial for the policy fitting, as it incentivizes the agent to match the relevant part of the prior due to information asymmetry, but still  ... 
arXiv:2203.04051v3 fatcat:ix4fgkkc45fj7jzzjutsdajo5m

Behavior Priors for Efficient Reinforcement Learning [article]

Dhruva Tirumala, Alexandre Galashov, Hyeonwoo Noh, Leonard Hasenclever, Razvan Pascanu, Jonathan Schwarz, Guillaume Desjardins, Wojciech Marian Czarnecki, Arun Ahuja, Yee Whye Teh, Nicolas Heess
2020 arXiv   pre-print
In this work we consider how information and architectural constraints can be combined with ideas from the probabilistic modeling literature to learn behavior priors that capture the common movement and  ...  We then extend these ideas to latent variable models and consider a formulation to learn hierarchical priors that capture different aspects of the behavior in reusable modules.  ...  We would also like to acknowledge David Szepesvari for their input on the role of hierarchy in RL. A large part of this work is the fruit of these discussions.  ... 
arXiv:2010.14274v1 fatcat:rctmp6klcnf4xfzx6zkgacdhwi

Minimum Description Length Control [article]

Ted Moskovitz, Ta-Chu Kao, Maneesh Sahani, Matthew M. Botvinick
2022 arXiv   pre-print
In doing so, MDL-C naturally balances adaptation to each task with epistemic uncertainty about the task distribution.  ...  In this approach, which we term MDL-control (MDL-C), the agent learns the common structure among the tasks with which it is faced and then distills it into a simpler representation which facilitates faster  ...  The RPO algorithms with learned default policies replace KL[π(•|s), Unif A ] with KL[π(•|s), π w (•|s)] (or KL[π w (•|s), π(•|s)] Figure H.2:To test the effect of information asymmetry on its on performance  ... 
arXiv:2207.08258v3 fatcat:feuhufw4vfhhxc5nufhvuy2bla

Machine-learning Kondo physics using variational autoencoders [article]

Cole Miles, Matthew R. Carbone, Erica J. Sturm, Deyu Lu, Andreas Weichselbaum, Kipton Barros, Robert M. Konik
2021 arXiv   pre-print
In particular, one latent space component correlates with particle-hole asymmetry, while another is in near one-to-one correspondence with the Kondo temperature, a dynamically generated low-energy scale  ...  in the impurity model.  ...  Since D KL measures the amount of information encoded into the latent space relative to the unit Gaussian prior [8] , we interpret the region marked in yellow as where the "critical" amount of information  ... 
arXiv:2107.08013v1 fatcat:ozfpb32lkrc53fzch4mmdfp744

Robust Asymmetric Learning in POMDPs [article]

Andrew Warrington and J. Wilder Lavington and Adam Ścibior and Mark Schmidt and Frank Wood
2021 arXiv   pre-print
We show that A2D produces an expert policy that the agent can safely imitate, in turn outperforming policies learned by imitating a fixed expert.  ...  approaches for this kind of imitation learning have a serious flaw: the expert does not know what the trainee cannot see, and so may encourage actions that are sub-optimal, even unsafe, under partial information  ...  tuning a regular RL algorithm still apply in A2D.  ... 
arXiv:2012.15566v3 fatcat:etbg3phqnvgdtfcm2ctbawhane

Importance Weighted Policy Learning and Adaptation [article]

Alexandre Galashov, Jakub Sygnowski, Guillaume Desjardins, Jan Humplik, Leonard Hasenclever, Rae Jeong, Yee Whye Teh, Nicolas Heess
2021 arXiv   pre-print
In this paper we study a complementary approach which is conceptually simple, general, modular and built on top of recent improvements in off-policy learning.  ...  In the meta reinforcement learning literature much recent work has focused on the problem of optimizing the learning process itself.  ...  Some RL algorithms, such as REPS [19] , MPO [3] therefore replace similar (soft) regularization terms with hard limits on KL or entropy.  ... 
arXiv:2009.04875v2 fatcat:7376gfqvqfa4xfq3di35imcqgu

Learning Transferable Motor Skills with Hierarchical Latent Mixture Policies [article]

Dushyant Rao, Fereshteh Sadeghi, Leonard Hasenclever, Markus Wulfmeier, Martina Zambelli, Giulia Vezzani, Dhruva Tirumala, Yusuf Aytar, Josh Merel, Nicolas Heess, Raia Hadsell
2022 arXiv   pre-print
In contrast to existing work, our method exploits a three-level hierarchy of both discrete and continuous latent variables, to capture a set of high-level behaviours while allowing for variance in how  ...  For robots operating in the real world, it is desirable to learn reusable behaviours that can effectively be transferred and adapted to numerous tasks and scenarios.  ...  Information-asymmetry As noted in previous work , hierarchical approaches often benefit from information-asymmetry, with higher levels seeing additional context or task-specific information.  ... 
arXiv:2112.05062v2 fatcat:mudcv6wdo5a33lri3wua6qpdvq

Proximal Policy Optimization with Relative Pearson Divergence [article]

Taisuke Kobayashi
2021 arXiv   pre-print
As another problem of PPO, the symmetric threshold is given numerically while the density ratio itself is in asymmetric domain, thereby causing unbalanced regularization of the policy.  ...  Through its analysis, the intuitive threshold-based design consistent with the asymmetry of the threshold and the domain of density ratio can be derived.  ...  Proximal policy optimization In PPO [10] , the initial version had the explicit regularization for KL divergence.  ... 
arXiv:2010.03290v2 fatcat:gphdljbjzzhj7d3ys6nxo7izxe

Perception-Prediction-Reaction Agents for Deep Reinforcement Learning [article]

Adam Stooke, Valentin Dalibard, Siddhant M. Jayakumar, Wojciech M. Czarnecki, Max Jaderberg
2020 arXiv   pre-print
an information asymmetry.  ...  An auxiliary loss regularizes policies drawn from all three cores against each other, enacting the prior that the policy should be expressible from either recent or long-term memory.  ...  Our work also builds on recent approaches to learning priors with information asymmetry for RL.  ... 
arXiv:2006.15223v1 fatcat:q6ryxrrarfbd5bayupan7lg3bi

Compositional Transfer in Hierarchical Reinforcement Learning [article]

Markus Wulfmeier, Abbas Abdolmaleki, Roland Hafner, Jost Tobias Springenberg, Michael Neunert, Tim Hertweck, Thomas Lampe, Noah Siegel, Nicolas Heess, Martin Riedmiller
2020 arXiv   pre-print
We introduce Regularized Hierarchical Policy Optimization (RHPO) to improve data-efficiency for domains with multiple dominant tasks and ultimately reduce required platform time.  ...  The presented algorithm enables stable and fast learning for complex, real-world domains in the parallel multitask and sequential transfer case.  ...  Information asymmetry in kl-regularized rl. 2018. [16] Tuomas Haarnoja, Kristian Hartikainen, Pieter Abbeel, and Sergey Levine. Latent space policies for hierarchical reinforcement learning.  ... 
arXiv:1906.11228v3 fatcat:ipw6uxy4lzbbvizfhmuggj553m

Learning Dexterous Manipulation from Suboptimal Experts [article]

Rae Jeong, Jost Tobias Springenberg, Jackie Kay, Daniel Zheng, Yuxiang Zhou, Alexandre Galashov, Nicolas Heess, Francesco Nori
2021 arXiv   pre-print
Although in many cases the learning process could be guided by demonstrations or other suboptimal experts, current RL algorithms for continuous action spaces often fail to effectively utilize combinations  ...  Finally, we show that REQ is also effective for general off-policy RL, offline RL, and RL from demonstrations. Videos and further materials are available at  ...  Information asymmetry in KL-regularized RL. In International Conference on Learning Representations, 2019. [33] D. Tirumala, H. Noh, A. Galashov, L. Hasenclever, A. Ahuja, G. Wayne, R.  ... 
arXiv:2010.08587v2 fatcat:l3o7m2ht6fakhiuzxbu66trg2i

Combating Noisy Labels by Agreement: A Joint Training Method with Co-Regularization

Hongxin Wei, Lei Feng, Xiangyu Chen, Bo An
2020 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)  
Deep Learning with noisy labels is a practically challenging problem in weakly supervised learning.  ...  Trained by the joint loss, these two networks would be more and more similar due to the effect of Co-Regularization.  ...  Co-RLS [34] extends standard regularization methods like Support Vector Machines (SVM) and Regularized Least squares (RLS) to multi-view semi-supervised learning by optimizing measures of agreement and  ... 
doi:10.1109/cvpr42600.2020.01374 dblp:conf/cvpr/WeiFC020 fatcat:go47ic2yvjfjvelttvjmsdj5qa

Disentangling Options with Hellinger Distance Regularizer [article]

Minsung Hyun, Junyoung Choi, Nojun Kwak
2019 arXiv   pre-print
In this paper, we propose a Hellinger distance regularizer, a method for disentangling options.  ...  The options framework provided clues to temporal abstraction in the RL, and the option-critic architecture elegantly solved the two problems of finding options and learning RL agents in an end-to-end manner  ...  In order to complement the asymmetry of KLD, Lin (1991) proposed Jensen-Shannon divergence (JSD).  ... 
arXiv:1904.06887v1 fatcat:7qfzzrwmvvewfb4kxqmmgn6e74
« Previous Showing results 1 — 15 out of 815 results