58,547 Hits in 4.1 sec

Policy Gradients with Parameter-Based Exploration for Control [chapter]

Frank Sehnke, Christian Osendorfer, Thomas Rückstieß, Alex Graves, Jan Peters, Jürgen Schmidhuber
Lecture Notes in Computer Science  
For several complex control tasks, including robust standing with a humanoid robot, we show that our method outperforms well-known algorithms from the fields of policy gradients, finite difference methods  ...  Our method estimates a likelihood gradient by sampling directly in parameter space, which leads to lower variance gradient estimates than those obtained by policy gradient methods such as REINFORCE.  ...  Acknowledgement: This work is supported within the DFG excellence research cluster "Cognition for Technical Systems -CoTeSys",  ... 
doi:10.1007/978-3-540-87536-9_40 fatcat:6ulgdt3tezeaznarl52m6wuyou

Exploring Parameter Space in Reinforcement Learning

Thomas Rückstieß, Frank Sehnke, Tom Schaul, Daan Wierstra, Yi Sun, Jürgen Schmidhuber
2010 Paladyn: Journal of Behavioral Robotics  
We review two recent parameter-exploring algorithms: Natural Evolution Strategies and Policy Gradients with Parameter-Based Exploration.  ...  AbstractThis paper discusses parameter-based exploration methods for reinforcement learning.  ...  Gradients with Parameter-based Exploration Following the idea above, in [22] it is proposed to use policy gradients with parameter-based exploration, where a distribution over the parameters of a controller  ... 
doi:10.2478/s13230-010-0002-4 fatcat:l7s2nghtzrfpdgmoeguff4rfve

EM-based policy hyper parameter exploration: application to standing and balancing of a two-wheeled smartphone robot

Jiexin Wang, Eiji Uchibe, Kenji Doya
2016 Artificial Life and Robotics  
: Policy Gradient with Parameter Exploration (PGPE) and EM-based Reward-Weighted Regression.  ...  Sehnke et al. proposed a method so-called Policy Gradients with Parameter-based Exploration (PGPE) [4] to solve this problem by evaluating deterministic policies with the parameters sampled from a prior  ...  Parameter-based Exploration.  ... 
doi:10.1007/s10015-015-0260-7 fatcat:nq3f7a3pxfcjhekwh3q5ccsmym

Generalize Robot Learning From Demonstration to Variant Scenarios With Evolutionary Policy Gradient

Junjie Cao, Weiwei Liu, Yong Liu, Jian Yang
2020 Frontiers in Neurorobotics  
Our Evolutionary Policy Gradient combines parameter perturbation with policy gradient method in the framework of Evolutionary Algorithms (EAs) and can fuse the benefits of both, achieving effective and  ...  The experiments, carried out in robot control tasks in OpenAI Gym with dense and sparse rewards, show that our EPG is able to provide competitive performance over the original policy gradient methods and  ...  ., 2010) performs gradient based search in parameter space with low variance and is similar to ES. Wang et al. (2017) improve PGPE with EM-based policy exploration and an adaptive mechanism.  ... 
doi:10.3389/fnbot.2020.00021 pmid:32372940 pmcid:PMC7188386 fatcat:lodwo6wq2ngvlcfccuhzaa5fay

A Survey on Policy Search for Robotics

Marc Peter Deisenroth
2011 Foundations and Trends in Robotics  
Policy search is a subfield in reinforcement learning which focuses on finding good parameters for a given policy parametrization.  ...  We classify model-free methods based on their policy evaluation strategy, policy update strategy and exploration strategy and present an unified view on existing algorithms.  ...  For highdimensional policy parameters with k > 50, we recommend using analytic policy gradients if they are available.  ... 
doi:10.1561/2300000021 fatcat:bfka3vw4njgrhjiz5kbpey42nq

Generalized exploration in policy search

Herke van Hoof, Daniel Tanneberg, Jan Peters
2017 Machine Learning  
We introduce a unifying view on step-based and episode-based exploration that allows for such balanced trade-offs. This trade-off strategy can be used with various reinforcement learning algorithms.  ...  In this paper, we study this generalized exploration strategy in a policy gradient method and in relative entropy policy search.  ...  Generalized exploration for policy gradients In policy gradient methods, as the name implies, the policy parameters are updated by a step in the direction of the estimated gradient of the expected return  ... 
doi:10.1007/s10994-017-5657-1 fatcat:6677wvgemnfubcxc6jrlqbymty

Trajectory-Based Off-Policy Deep Reinforcement Learning [article]

Andreas Doerr, Michael Volpp, Marc Toussaint, Sebastian Trimpe, Christian Daniel
2019 arXiv   pre-print
This work addresses these weaknesses by combining recent improvements in the reuse of off-policy data and exploration in parameter space with deterministic behavioral policies.  ...  However, these methods are also data-inefficient, afflicted with high variance gradient estimates, and frequently get stuck in local optima.  ...  policy parameters θ (cf. (2)), the trajectory likelihood gradient ∇ θ log p(τ | θ) with respect to the policy parameters can be computed analytically for a given, differentiable policy ∇ θ log π(a t |  ... 
arXiv:1905.05710v1 fatcat:6po2azo7yndsrjmh4ewcdnfmum

Entropy Regularization with Discounted Future State Distribution in Policy Gradient Methods [article]

Riashat Islam, Raihan Seraj, Pierre-Luc Bacon, Doina Precup
2019 arXiv   pre-print
We propose that exploration can be achieved by entropy regularization with the discounted state distribution in policy gradients, where a metric for maximal coverage of the state space can be based on  ...  The policy gradient theorem is defined based on an objective with respect to the initial distribution over states.  ...  in policy gradient based methods.  ... 
arXiv:1912.05104v1 fatcat:laymienps5hulb2o5pub5rjdou

Stein Variational Policy Gradient [article]

Yang Liu, Prajit Ramachandran, Qiang Liu, Jian Peng
2017 arXiv   pre-print
However, policy gradient methods suffer from high variance, slow convergence, and inefficient exploration.  ...  In this work, we introduce a maximum entropy policy optimization framework which explicitly encourages parameter exploration, and show that this framework can be reduced to a Bayesian inference problem  ...  Deep neural networks trained with policy gradient methods have demonstrated impressive performance on continuous control, vision-based navigation and Atari games (Schulman et al., 2015b , Kakade, 2002  ... 
arXiv:1704.02399v1 fatcat:z3vyclidujfzdgneix72v3akf4

Evolution-Guided Policy Gradient in Reinforcement Learning [article]

Shauharda Khadka, Kagan Tumer
2018 arXiv   pre-print
off-policy DRL's ability to leverage gradients for higher sample efficiency and faster learning.  ...  ERL inherits EA's ability of temporal credit assignment with a fitness metric, effective exploration with a diverse set of policies, and stability of a population-based approach and complements it with  ...  The parameter controls how frequently the exploration in action space (rl actor ) shares information with the exploration in the parameter space (actors in the evolutionary population).  ... 
arXiv:1805.07917v2 fatcat:336fcth2qvac3hhjxsxffhhouq

Policy gradient methods

Jan Peters
2010 Scholarpedia  
In optimal control, model-based gradient methods have been used for optimizing policies since the late 1960s.  ...  θ 2 ) with an exploration parameter θ 2 , see [8, 5] .  ... 
doi:10.4249/scholarpedia.3698 fatcat:vpn346nvkneq3bswbs5xxdvmzu

Policy Gradient Methods [chapter]

Jan Peters, J. Andrew Bagnell
2016 Encyclopedia of Machine Learning and Data Mining  
In optimal control, model-based gradient methods have been used for optimizing policies since the late 1960s.  ...  θ 2 ) with an exploration parameter θ 2 , see [8, 5] .  ... 
doi:10.1007/978-1-4899-7502-7_646-1 fatcat:kxs2bj7mrref5d2a55xmh7q7uq

Model-Based Reinforcement Learning [chapter]

Soumya Ray, Prasad Tadepalli
2014 Encyclopedia of Machine Learning and Data Mining  
The models predict the outcomes of actions and are used in lieu of or in addition to interaction with the environment to learn optimal policies.  ...  Model-based Reinforcement Learning refers to learning optimal behavior indirectly by learning a model of the environment by taking actions and observing the outcomes that include the next state and the  ...  Efficient Exploration in Reinforcement Learning, Symbolic Dynamic Programming, Adaptive Real-time Dynamic Programming, Bayesian Reinforcement Learning, Autonomous Helicopter Flight Using Reinforcement  ... 
doi:10.1007/978-1-4899-7502-7_561-1 fatcat:4pwzznqsefhq3e2oqs2mavvxp4

Learning to Explore with Meta-Policy Gradient [article]

Tianbing Xu, Qiang Liu, Liang Zhao, Jian Peng
2018 arXiv   pre-print
Existing exploration methods are mostly based on adding noise to the on-going actor policy and can only explore local regions close to what the actor policy dictates.  ...  In this work, we develop a simple meta-policy gradient algorithm that allows us to adaptively learn the exploration policy in DDPG.  ...  Acknowledgement We appreciate Kliegl Markus for his insightful discussions and helpful comments.  ... 
arXiv:1803.05044v2 fatcat:gxgkv6uljfbzjmfpwgzk6itof4

Meta Reinforcement Learning with Distribution of Exploration Parameters Learned by Evolution Strategies [article]

Yiming Shen, Kehan Yang, Yufeng Yuan, Simon Cheng Liu
2019 arXiv   pre-print
In this paper, we propose a novel meta-learning method in a reinforcement learning setting, based on evolution strategies (ES), exploration in parameter space and deterministic policy gradients.  ...  We demonstrate that our method achieves good results compared to gradient-based meta-learning in high-dimensional control tasks in the MuJoCo simulator.  ...  The K samples have higher deviations for better exploration while the initial policy with lower deviation can make the learning more stable.  ... 
arXiv:1812.11314v2 fatcat:qbvc7ba7kvdjjod3k5wyqzxctm
« Previous Showing results 1 — 15 out of 58,547 results