Filters








11,277 Hits in 4.3 sec

Locally Persistent Exploration in Continuous Control Tasks with Sparse Rewards [article]

Susan Amin
2021 arXiv   pre-print
We provide empirical evaluations of our approach in a simulated 2D navigation task, as well as higher-dimensional MuJoCo continuous control locomotion tasks with sparse rewards.  ...  A major challenge in reinforcement learning is the design of exploration strategies, especially for environments with sparse reward structures and continuous state and action spaces.  ...  Walter Reisner for providing valuable feedback on the initial direction of this work, and Riashat Islam for helping with the experiments in the early stages of the project.  ... 
arXiv:2012.13658v2 fatcat:k2u36r33qfdf5nkr5t7a56j64u

Overcoming Exploration in Reinforcement Learning with Demonstrations [article]

Ashvin Nair, Bob McGrew, Marcin Andrychowicz, Wojciech Zaremba, Pieter Abbeel
2018 arXiv   pre-print
Exploration in environments with sparse rewards has been a persistent problem in reinforcement learning (RL).  ...  In this work, we use demonstrations to overcome the exploration problem and successfully learn to perform long-horizon, multi-step robotics tasks with continuous control such as stacking blocks with a  ...  However, exploration in an environment with sparse reward is difficult since with random exploration, the agent rarely sees a reward signal.  ... 
arXiv:1709.10089v2 fatcat:m56ny7kxaneitbunujtmar7hbm

Accelerated Robot Learning via Human Brain Signals [article]

Iretiayo Akinola, Zizhao Wang, Junyao Shi, Xiaomin He, Pawan Lapborisuth, Jingxi Xu, David Watkins-Valls, Paul Sajda, Peter Allen
2020 arXiv   pre-print
In reinforcement learning (RL), sparse rewards are a natural way to specify the task to be learned.  ...  Using a robotic navigation task as a test bed, we show that our method achieves a stable obstacle-avoidance policy with high success rate, outperforming learning from sparse rewards only that struggles  ...  Efficient Sparse-Reward RL with Guided Exploration The final stage is to enable the RL agent to learn efficiently in an environment with sparse rewards.  ... 
arXiv:1910.00682v2 fatcat:ci5t63ztmfdr5p5kksabum4am4

Sparse Reward Exploration via Novelty Search and Emitters [article]

Giuseppe Paolo
2021 arXiv   pre-print
In this work, we introduce the SparsE Reward Exploration via Novelty and Emitters (SERENE) algorithm, capable of efficiently exploring a search space, as well as optimizing rewards found in potentially  ...  The need for efficient exploration is even more significant in sparse reward settings, in which performance feedback is given sparingly, thus rendering it unsuitable for guiding the search process.  ...  The emitters then perform both local exploration and exploitation of the reward, leading to degraded performances in settings with very sparse rewards, where not all policies can obtain a reward.  ... 
arXiv:2102.03140v1 fatcat:wmxn4sqtjrhwnj2uy4rorwdmoq

Monte-Carlo Tree Search for Policy Optimization [article]

Xiaobai Ma, Katherine Driggs-Campbell, Zongzhang Zhang, Mykel J. Kochenderfer
2019 arXiv   pre-print
We demonstrate improved performance on reinforcement learning tasks with deceptive or sparse reward functions compared to popular gradient-based and deep genetic algorithm baselines.  ...  Gradient-based methods are often used for policy optimization in deep reinforcement learning, despite being vulnerable to local optima and saddle points.  ...  Zongzhang Zhang is in part supported by the National Natural Science Foundation of China under Grant No. 61876119, and the Natural Science Foundation of Jiangsu under Grant No.  ... 
arXiv:1912.10648v1 fatcat:qh6rcvalnjherbb64zxyfxrhba

PBCS : Efficient Exploration and Exploitation Using a Synergy between Reinforcement Learning and Motion Planning [article]

Guillaume Matheron, Nicolas Perrin, Olivier Sigaud
2020 arXiv   pre-print
The exploration-exploitation trade-off is at the heart of reinforcement learning (RL). However, most continuous control benchmarks used in recent RL research only require local exploration.  ...  This led to the development of algorithms that have basic exploration capabilities, and behave poorly in benchmarks that require more versatile exploration.  ...  In this paper, we consider environments that present a hard exploration problem with a sparse reward.  ... 
arXiv:2004.11667v1 fatcat:vocuy3i7vvb4jg6xp67rhblxhi

Locus coeruleus reports changes in environmental contingencies

Susan J. Sara
2016 Behavioral and Brain Sciences  
Careful perusal of the sparse data available from recording studies in animals reveals that noradrenergic neurons are excited mainly by any change in the environment – a salient, novel, or unexpected sensory  ...  stimulus or a change in behavioral contingencies.  ...  Several other similar studies in humans have confirmed that LC is co-activated with frontal regions or has increased functional connectivity with them in cognitively demanding tasks requiring rapid shifts  ... 
doi:10.1017/s0140525x15001946 pmid:28347363 fatcat:cypweo2azjhvdozhhgfvep2pkq

Reinforcement Learning from Imperfect Demonstrations under Soft Expert Guidance [article]

Mingxuan Jing, Xiaojian Ma, Wenbing Huang, Fuchun Sun, Chao Yang, Bin Fang, Huaping Liu
2019 arXiv   pre-print
In this paper, we study Reinforcement Learning from Demonstrations (RLfD) that improves the exploration efficiency of Reinforcement Learning (RL) by providing expert demonstrations.  ...  We further demonstrate that such problem is able to be addressed efficiently by performing a local linear search on its dual form.  ...  It was also partially supported by the National Science Foundation of China (NSFC) and the German Research Foundation (DFG) in project Cross Modal Learning, NSFC 61621136008/DFG TRR-169.  ... 
arXiv:1911.07109v2 fatcat:ms5ms5c24bfrzas4v5m3binfl4

A survey on intrinsic motivation in reinforcement learning [article]

Arthur Aubret, Laetitia Matignon, Salima Hassas
2019 arXiv   pre-print
In this article, we provide a survey on the role of intrinsic motivation in DRL.  ...  We choose to survey these research works, from the perspective of learning how to achieve tasks.  ...  Such environments with sparse rewards are almost impossible to solve with the above mentioned exploration policies since the agent does not have local indications on the way to improve its policy.  ... 
arXiv:1908.06976v2 fatcat:xxi3jgdtbjakvprzgkpbpohllu

Reinforcement Learning from Imperfect Demonstrations under Soft Expert Guidance

Mingxuan Jing, Xiaojian Ma, Wenbing Huang, Fuchun Sun, Chao Yang, Bin Fang, Huaping Liu
2020 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
In this paper, we study Reinforcement Learning from Demonstrations (RLfD) that improves the exploration efficiency of Reinforcement Learning (RL) by providing expert demonstrations.  ...  We further demonstrate that such problem is able to be addressed efficiently by performing a local linear search on its dual form.  ...  It was also partially supported by the National Science Foundation of China (NSFC) and the German Research Foundation (DFG) in project Cross Modal Learning, NSFC 61621136008/DFG TRR-169.  ... 
doi:10.1609/aaai.v34i04.5953 fatcat:pkw7h5h7mfduni325g4ko6mheq

COBRA: Data-Efficient Model-Based RL through Unsupervised Object Discovery and Curiosity-Driven Exploration [article]

Nicholas Watters, Loic Matthey, Matko Bosnjak, Christopher P. Burgess, Alexander Lerchner
2019 arXiv   pre-print
Here we introduce a modular approach to addressing these challenges in a continuous control environment, without using hand-crafted or supervised information.  ...  Our Curious Object-Based seaRch Agent (COBRA) uses task-free intrinsically motivated exploration and unsupervised learning to build object-based models of its environment and action space.  ...  predictor or a Value predictor, when solving Goal Finding tasks with sparse terminal rewards only.  ... 
arXiv:1905.09275v2 fatcat:gvkoisskqnadvghggtc7i3xnxy

Learning Value Functions in Deep Policy Gradients using Residual Variance [article]

Yannis Flet-Berliac, Reda Ouhamma, Odalric-Ambrym Maillard, Philippe Preux
2021 arXiv   pre-print
Furthermore, we validate our method in tasks with sparse rewards, where we provide experimental evidence and theoretical insights.  ...  We prove the theoretical consistency of the new gradient estimator and observe dramatic empirical improvement across a variety of continuous control tasks and algorithms.  ...  While such methods demonstrate great performance in continuous control tasks, several discrepancies persist between what motivates the conceptual framework of these algorithms and what is implemented in  ... 
arXiv:2010.04440v3 fatcat:z2d6bty4rvahvmuzvhjh2quo6y

Beyond stimulus cues and reinforcement signals: A new approach to animal metacognition

Justin J. Couchman, Mariana V. C. Coutinho, Michael J. Beran, J. David Smith
2010 Journal of Comparative Psychology  
To explore this possibility, we placed humans and monkeys in successive uncertainty-monitoring tasks that were qualitatively different, eliminating many associative cues that might support transfer across  ...  tasks.  ...  Gale: Continuity task ( Figure 2B) . In this task, Gale performed as in the Length task.  ... 
doi:10.1037/a0020129 pmid:20836592 pmcid:PMC2991470 fatcat:adhktbnp7jcvbktow2bpriuibu

TAAC: Temporally Abstract Actor-Critic for Continuous Control [article]

Haonan Yu, Wei Xu, Haichao Zhang
2021 arXiv   pre-print
We demonstrate TAAC's advantages over several strong baselines across 14 continuous control tasks.  ...  This suggests that aside from encouraging persistent exploration, action repetition can find its place in a good policy behavior. Code is available at https://github.com/hnyu/taac.  ...  and BipedalWalkerHardcore; d) Manipulation: Four Fetch (Plappert et al., 2018) tasks with sparse rewards and hard exploration (reward given only upon success): FetchReach, FetchPush, FetchSlide, and FetchPickAndPlace  ... 
arXiv:2104.06521v3 fatcat:7aj5ix6g4jdwng2vh62mszkhsa

Visual Reinforcement Learning with Imagined Goals [article]

Ashvin Nair, Vitchyr Pong, Murtaza Dalal, Shikhar Bahl, Steven Lin, Sergey Levine
2018 arXiv   pre-print
We learn a visual representation with three distinct purposes: sampling goals for self-supervised practice, providing a structured transformation of raw sensory inputs, and computing a reward signal for  ...  In this paper, we propose an algorithm that acquires such general-purpose skills by combining unsupervised representation learning and reinforcement learning of goal-conditioned policies.  ...  We would also like to thank Carlos Florensa for making multiple useful suggestions in later version of the draft.  ... 
arXiv:1807.04742v2 fatcat:65icyi2f6vctfjluyig6vqo5su
« Previous Showing results 1 — 15 out of 11,277 results