Filters








34,699 Hits in 4.6 sec

Improving reinforcement learning algorithms: towards optimal learning rate policies [article]

Othmane Mounjid, Charles-Albert Lehalle
2021 arXiv   pre-print
This paper investigates to what extent one can improve reinforcement learning algorithms. Our study is split in three parts.  ...  Second, we propose a dynamic optimal policy for the choice of the learning rate (γ_k)_k≥ 0 used in stochastic approximation (SA).  ...  However, in this section, we present an optimal dynamic policy for the choice of the learning rate (γ k ) k∈N .  ... 
arXiv:1911.02319v6 fatcat:66q2tkqo5jclpawcjglpapdihe

Reinforcement Learning for Improving Agent Design [article]

David Ha
2018 arXiv   pre-print
In many reinforcement learning tasks, the goal is to learn a policy to manipulate an agent, whose design is fixed, to maximize some notion of cumulative reward.  ...  The design of the agent's physical structure is rarely optimized for the task at hand.  ...  While the original design is symmetric, the learned design (Table 1) breaks symmetry, and biases towards larger rear legs while jointly learning the navigation policy using an asymmetric body.  ... 
arXiv:1810.03779v2 fatcat:tpronrfxxvazfjgfqgcpihfzzy

An Improved Sarsa(λ) Reinforcement Learning Algorithm for Wireless Communication Systems

Hao Jiang, Renjie Gui, Zhen Chen, Liang Wu, Jian Dang, Jie Zhou
2019 IEEE Access  
Numerical results demonstrate that the proposed algorithm has the advantage of high learning efficiency and a higher learning-rate tolerance range than Q Learning, Sarsa, Expected Sarsa, and Sarsa(λ) in  ...  It does not require prior environmental information and relies only on interaction with the environment to conduct the trial-and-error process and accumulates experience to learn the optimal control policy  ...  The use of eligibility traces reduces the number of episodes required by the algorithm to find the optimal policy, which improves the learning efficiency.  ... 
doi:10.1109/access.2019.2935255 fatcat:ugvxdekwjvhj7fqto3idebtyjy

SIBRE: Self Improvement Based REwards for Adaptive Feedback in Reinforcement Learning [article]

Somjit Nath, Richa Verma, Abhik Ray, Harshad Khadilkar
2020 arXiv   pre-print
We propose a generic reward shaping approach for improving the rate of convergence in reinforcement learning (RL), called Self Improvement Based REwards, or SIBRE.  ...  Experiments on several well-known benchmark environments with different RL algorithms show that SIBRE converges to the optimal policy faster and more stably.  ...  We assume the existence of a reinforcement learning algorithm for learning the optimal mapping S → A.  ... 
arXiv:2004.09846v3 fatcat:2dqbb5kktzatnoin3oztaqgq3q

Algorithmic Improvements for Deep Reinforcement Learning applied to Interactive Fiction [article]

Vishal Jain, William Fedus, Hugo Larochelle, Doina Precup, Marc G. Bellemare
2019 arXiv   pre-print
Text-based games are a natural challenge domain for deep reinforcement learning algorithms.  ...  Empirically, we find that these techniques improve the performance of a baseline deep reinforcement learning agent applied to text-based games.  ...  Conclusions and Future work We introduced two algorithmic improvements for deep reinforcement learning applied to interactive fiction (IF).  ... 
arXiv:1911.12511v1 fatcat:jbjomsgubneyhaksbbwgyus33e

Improving Reinforcement Learning with Human Input

Matthew E. Taylor
2018 Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence  
Reinforcement learning (RL) has had many successes when learning autonomously. This paper and accompanying talk consider how to make use of a non-technical human participant, when available.  ...  In particular, we consider the case where a human could 1) provide demonstrations of good behavior, 2) provide online evaluative feedback, or 3) define a curriculum of tasks for the agent to learn on.  ...  We therefore updated our curriculum learning algorithm to be biased in learning the target task towards concepts that were most frequently seen in the curriculum.  ... 
doi:10.24963/ijcai.2018/817 dblp:conf/ijcai/Taylor18 fatcat:fkj3vl77pva3bez2uvtfaxrcfu

Algorithmic Improvements for Deep Reinforcement Learning Applied to Interactive Fiction

Vishal Jain, William Fedus, Hugo Larochelle, Doina Precup, Marc G. Bellemare
2020 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
Text-based games are a natural challenge domain for deep reinforcement learning algorithms.  ...  Empirically, we find that these techniques improve the performance of a baseline deep reinforcement learning agent applied to text-based games.  ...  Conclusions and Future work We introduced two algorithmic improvements for deep reinforcement learning applied to interactive fiction (IF).  ... 
doi:10.1609/aaai.v34i04.5857 fatcat:tlufceqrlfbunkccc6yeor2wx4

Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching [chapter]

Long-Ji Lin
1992 Reinforcement Learning  
that will speed up reinforcement learning.  ...  To date, reinforcement learning has mostly been studied solving simple learning tasks. Reinforcement learning methods that have been studied so far typically converge slowly.  ...  number of hidden units of the evaluation, policy and utility networks; • ne , rp and tu: the learning rate of the backpropagation algorithm for the evaluation, policy and utility networks; • the momentum  ... 
doi:10.1007/978-1-4615-3618-5_5 fatcat:rth3jyl4lfcnvltmknuoxijram

Reinforcement Learning for Improving Object Detection [article]

Siddharth Nayak, Balaraman Ravindran
2020 arXiv   pre-print
In this paper, we introduce an algorithm called ObjectRL to choose the amount of a particular pre-processing to be applied to improve the object detection performances of pre-trained networks.  ...  The main motivation for ObjectRL is that an image which looks good to a human eye may not necessarily be the optimal one for a pre-trained object detector to detect objects.  ...  We use Adam Optimizer [12] with a learning rate of 10 −3 . We use an −Greedy method for exploration where we anneal linearly with the number of episodes until it reaches 0.05.  ... 
arXiv:2008.08005v1 fatcat:6wwpphxmgrcspjegbjov5gqf5e

Reinforcement Learning for Improving Agent Design

David Ha
2019 Artificial Life  
In many reinforcement learning tasks, the goal is to learn a policy to manipulate an agent, whose design is fixed, to maximize some notion of cumulative reward.  ...  The design of the agent's physical structure is rarely optimized for the task at hand.  ...  While the original design is symmetric, the learned design (Table 1) breaks symmetry and biases towards larger rear legs while jointly learning the navigation policy using an asymmetric body.  ... 
doi:10.1162/artl_a_00301 pmid:31697584 fatcat:xxf3gcdnojgnlag3og72omums4

Improving the dynamics of quantum sensors with reinforcement learning [article]

Jonas Schuff, Lukas J. Fiderer, Daniel Braun
2019 arXiv   pre-print
Here, we use the cross entropy method of reinforcement learning to optimize the strength and position of control pulses.  ...  By visualizing the evolution of the quantum state, the mechanism exploited by the reinforcement learning method is identified as a kind of spin-squeezing strategy that is adapted to the superradiant damping  ...  For training we use the Adam optimizer [54] with learning rate 0.001.  ... 
arXiv:1908.08416v1 fatcat:sjclnxh2cjgwza7xjqyypxlelu

Improving Reinforcement Learning Speed for Robot Control

Laetitia Matignon, Guillaume Laurent, Nadine Fort-piat
2006 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems  
Reinforcement Learning (RL) is an intuitive way of programming well-suited for use on autonomous robots because it does not need to specify how the task has to be achieved.  ...  In this paper, we develop a theoretical study of the influence of some RL parameters over the learning speed.  ...  Under some conditions [12] , Q-learning algorithm is guaranteed to converge to the optimal value function Q * .  ... 
doi:10.1109/iros.2006.282341 dblp:conf/iros/MatignonLF06 fatcat:xnhnia3uyzdsrofkmiuseaysia

Likelihood Quantile Networks for Coordinating Multi-Agent Reinforcement Learning [article]

Xueguang Lyu, Christopher Amato
2020 arXiv   pre-print
In particular, each agent considers the likelihood that other agent exploration and policy changes are occurring, essentially utilizing the agent's own estimations to weigh the learning rate that should  ...  be applied towards the given samples.  ...  CONCLUSION This paper describes a novel distributional RL method for improving performance in cooperative multi-agent reinforcement learning settings.  ... 
arXiv:1812.06319v6 fatcat:6lsfmhoww5ffjlkwvuwrw4z5je

Cautious Policy Programming: Exploiting KL Regularization in Monotonic Policy Improvement for Reinforcement Learning [article]

Lingwei Zhu, Toshinori Kitamura, Takamitsu Matsubara
2022 arXiv   pre-print
In this paper, we propose cautious policy programming (CPP), a novel value-based reinforcement learning (RL) algorithm that can ensure monotonic policy improvement during learning.  ...  Based on the nature of entropy-regularized RL, we derive a new entropy regularization-aware lower bound of policy improvement that only requires estimating the expected policy advantage function.  ...  CPP made a step towards practical monotonic improving RL by leveraging entropy-regularized RL. However, there is still room for improvement.  ... 
arXiv:2107.05798v3 fatcat:g3uicog4tzhgjph2g6xwlujkby

Improving Maneuver Strategy in Air Combat by Alternate Freeze Games with a Deep Reinforcement Learning Algorithm

Zhuang Wang, Hui Li, Haolin Wu, Zhaoxin Wu
2020 Mathematical Problems in Engineering  
Agents are trained by alternate freeze games with a deep reinforcement algorithm to deal with nonstationarity.  ...  Middleware which connects the agents and air combat simulation software is developed to provide a reinforcement learning environment for agent training.  ...  aircraft model and other functions are the same as those proposed in this paper. is environment and the RL agent are packaged as a supplementary material. rough this material, the alternate freeze game DQN algorithm  ... 
doi:10.1155/2020/7180639 fatcat:lmszqmkfanavtchi6ymx5bghte
« Previous Showing results 1 — 15 out of 34,699 results