105 Hits in 5.4 sec

RL^2: Fast Reinforcement Learning via Slow Reinforcement Learning [article]

Yan Duan, John Schulman, Xi Chen, Peter L. Bartlett, Ilya Sutskever, Pieter Abbeel
2016 arXiv   pre-print
In our proposed method, RL^2, the algorithm is encoded in the weights of the RNN, which are learned slowly through a general-purpose ("slow") RL algorithm.  ...  Rather than designing a "fast" reinforcement learning algorithm, we propose to represent it as a recurrent neural network (RNN) and learn it from data.  ...  That is, the "fast" RL algorithm is a computation whose state is stored in the RNN activations, and the RNN's weights are learned by a general-purpose "slow" reinforcement learning algorithm.  ... 
arXiv:1611.02779v2 fatcat:5uies6uzlnhwpdmjwx3ofnz4oq

Fast Adaptation via Policy-Dynamics Value Functions [article]

Roberta Raileanu, Max Goldstein, Arthur Szlam, Rob Fergus
2020 arXiv   pre-print
An ensemble of conventional RL policies is used to gather experience on training environments, from which embeddings of both policies and environments can be learned.  ...  At test time, a few actions are sufficient to infer the environment embedding, enabling a policy to be selected by maximizing the learned value function (which requires no additional environment interaction  ...  Reinforcement learning transfer via sparse coding. In Proceedings of the 11th international conference on autonomous agents and multiagent systems, volume 1, pp. 383-390.  ... 
arXiv:2007.02879v1 fatcat:zhfr27zs3rfphftwrijjlrx57q

Intelligent Inventory Control via Ruminative Reinforcement Learning

Tatpong Katanyukul, Edwin K. P. Chong
2014 Journal of Applied Mathematics  
Ruminative reinforcement learning (RRL) has been introduced recently based on this approach.  ...  Inventory management is a sequential decision problem that can be solved with reinforcement learning (RL).  ...  Reinforcement learning (RL) [2] provides a framework to find an approximate solution.  ... 
doi:10.1155/2014/238357 fatcat:wfcwezsuanevzop4kfmxcxiimq

Non-Cooperative Energy Efficient Power Allocation Game in D2D Communication: A Multi-Agent Deep Reinforcement Learning Approach

Khoi Khac Nguyen, Trung Q Duong, Ngo Anh Vien, Nhien-An Le-Khac, Nghia M Nguyen
2019 IEEE Access  
INDEX TERMS Energy efficient wireless communication, power allocation, D2D communication, multiagent reinforcement learning, deep reinforcement learning.  ...  In this paper, we propose to use reinforcement learning, an efficient simulation-based optimization framework, to tackle this problem so that user experience is maximized.  ...  Reinforcement learning (RL) [2] is a sub-field of machine learning which offers a mathematically principled framework studying how an autonomous agent makes optimal sequential decisions.  ... 
doi:10.1109/access.2019.2930115 fatcat:vuyjpjpumzamxhlulczi4hipwy

Reinforcement-guided learning in frontal neocortex: emerging computational concepts

Abhishek Banerjee, Rajeev V Rikhye, Adam Marblestone
2021 Current Opinion in Behavioral Sciences  
In this framework, reward drives plasticity in various neocortical regions, implementing multiple distinct reinforcement learning algorithms.  ...  Candidate functions for such neocortical contributions to reinforcement learning are increasingly being considered in artificial intelligence algorithms.  ...  [2] [3] [4] .  ... 
doi:10.1016/j.cobeha.2021.02.019 fatcat:e23xwqtibzgidcuozvfdm47dzy

A Review of Reinforcement Learning for Autonomous Building Energy Management [article]

Karl Mason, Santiago Grijalva
2019 arXiv   pre-print
The main direction for future research and challenges in reinforcement learning are also outlined.  ...  Reinforcement learning is one of the most prominent machine learning algorithms used for control problems and has had many successful applications in the area of building energy management.  ...  This is referred to as Meta learning [40] . Meta learning has been applied to RL algorithms. For example the RL 2 algorithm consists of a fast and slow RL algorithm [41] .  ... 
arXiv:1903.05196v2 fatcat:lihv4ftuovc3hhmofr7vpgn3mq

VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning [article]

Luisa Zintgraf, Kyriacos Shiarlis, Maximilian Igl, Sebastian Schulze, Yarin Gal, Katja Hofmann, Shimon Whiteson
2020 arXiv   pre-print
Trading off exploration and exploitation in an unknown environment is key to maximising expected return during learning.  ...  In this paper, we introduce variational Bayes-Adaptive Deep RL (variBAD), a way to meta-learn to perform approximate inference in an unknown environment, and incorporate task uncertainty directly during  ...  A prominent model-free meta-RL approach is to utilise the dynamics of recurrent networks for fast adaptation (RL 2 , Wang et al. (2016) ; Duan et al. (2016) ).  ... 
arXiv:1910.08348v2 fatcat:tzfn3oig2rea7dby3e45npcija

Meta-Reinforcement Learning of Structured Exploration Strategies [article]

Abhishek Gupta, Russell Mendonca, YuXuan Liu, Pieter Abbeel, Sergey Levine
2018 arXiv   pre-print
Exploration is a fundamental challenge in reinforcement learning (RL).  ...  We introduce a novel gradient-based fast adaptation algorithm -- model agnostic exploration with structured noise (MAESN) -- to learn exploration strategies from prior experience.  ...  In comparison, MAML and RL 2 don't learn behaviors that explore as effectively.  ... 
arXiv:1802.07245v1 fatcat:uzio7xkm6jhrvc36cqnnd4pqem

Towards Playing Full MOBA Games with Deep Reinforcement Learning [article]

Deheng Ye, Guibin Chen, Wen Zhang, Sheng Chen, Bo Yuan, Bo Liu, Jia Chen, Zhao Liu, Fuhao Qiu, Hongsheng Yu, Yinyuting Yin, Bei Shi (+6 others)
2020 arXiv   pre-print
In this paper, we propose a MOBA AI learning paradigm that methodologically enables playing full MOBA games with deep reinforcement learning.  ...  Specifically, we develop a combination of novel and existing learning techniques, including curriculum self-play learning, policy distillation, off-policy adaption, multi-head value estimation, and Monte-Carlo  ...  We divide heroes into groups, and start with training fixed lineups, i.e., 5 fixed heroes VS another 5 fixed heroes, via self-play RL. 2) Distillation.  ... 
arXiv:2011.12692v4 fatcat:avem3bna6nbcvbftblgue2ouw4

Reinforcement Learning and its Connections with Neuroscience and Psychology [article]

Ajay Subramanian, Sharad Chitlangia, Veeky Baths
2021 arXiv   pre-print
Reinforcement learning methods have recently been very successful at performing complex sequential tasks like playing Atari games, Go and Poker.  ...  In this paper, we comprehensively review a large number of findings in both neuroscience and psychology that evidence reinforcement learning as a promising candidate for modeling learning and decision  ...  Model-based reinforcement learning seeks to mimic these capabilities and is a promising area both in RL [2] (Figure 3) and as a computational model for biological learning [102] .  ... 
arXiv:2007.01099v5 fatcat:mjpkztlmqnfjba3dtcwqwmmlvu

Active inference: demystified and compared [article]

Noor Sajid, Philip J. Ball, Thomas Parr, Karl J. Friston
2020 arXiv   pre-print
In this paper, we provide: 1) an accessible overview of the discrete-state formulation of active inference, highlighting natural behaviors in active inference that are generally engineered in RL; 2) an  ...  This problem is also considered in reinforcement learning (RL), but limited work exists on comparing the two approaches on the same discrete-state environments.  ...  (via a learning rate: η).  ... 
arXiv:1909.10863v3 fatcat:x5bwlzhyzvde3bh24er3zyvewm

Deep Reinforcement Learning with Shallow Controllers: An Experimental Application to PID Tuning [article]

Nathan P. Lawrence, Michael G. Forbes, Philip D. Loewen, Daniel G. McClement, Johan U. Backstrom, R. Bhushan Gopaluni
2021 arXiv   pre-print
Deep reinforcement learning (RL) is an optimization-driven framework for producing control strategies for general dynamical systems without explicit reliance on process models.  ...  Out of RL-1 -RL-4, RL-2 and RL-3 remain the most promising results.  ...  Figure 4 : 4 Figure 4: Performance evolution under two different reward functions: (a) Tracking in experiment for RL-2; (b) Tracking heatmap for RL-2: x-axis is progression of episode cycles (2 step changes  ... 
arXiv:2111.07171v1 fatcat:k5o6g7qg4fhavhym56nmfm7rkm

Meta Reinforcement Learning with Autonomous Inference of Subtask Dependencies [article]

Sungryull Sohn, Hyunjae Woo, Jongwook Choi, Honglak Lee
2020 arXiv   pre-print
To facilitate learning, we adopt an intrinsic reward inspired by upper confidence bound (UCB) that encourages efficient exploration.  ...  Instead of directly learning a meta-policy, we develop a Meta-learner with Subtask Graph Inference(MSGI), which infers the latent parameter of the task by interacting with the environment and maximizes  ...  more effective than learning slow-parameters and fast-parameters (e.g., RNN states) on those tasks involving complex subtask dependencies.  ... 
arXiv:2001.00248v2 fatcat:ahnce2tz3ncvhmebyd2qrcrgvm

Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments [article]

Maruan Al-Shedivat, Trapit Bansal, Yuri Burda, Ilya Sutskever, Igor Mordatch, Pieter Abbeel
2018 arXiv   pre-print
In this paper, we cast the problem of continuous adaptation into the learning-to-learn framework.  ...  Our experiments with a population of agents that learn and compete suggest that meta-learners are the fittest.  ...  Second, adaptation via RL 2 tended to perform equally or a little worse than plain LSTM with or without tracking in this setup.  ... 
arXiv:1710.03641v2 fatcat:ouvxo5wmevbxbdakfjswocegpy

Meta-Reinforcement Learning by Tracking Task Non-stationarity [article]

Riccardo Poiani, Andrea Tirinzoni, Marcello Restelli
2021 arXiv   pre-print
Meta-reinforcement learning (RL) has been shown successful for training agents that quickly adapt to related tasks.  ...  At test time, TRIO tracks the evolution of the latent parameters online, hence reducing the uncertainty over future tasks and obtaining fast adaptation through the meta-learned policy.  ...  Rl 2 : Fast reinforcement learning via slow reinforcement learning. arXiv preprint arXiv:1611.02779, 2016. [Finn et al., 2017] C. Finn, P. Abbeel, and S. Levine.  ... 
arXiv:2105.08834v1 fatcat:vblxg6i35fgmjl4tkyhdtmdjeu
« Previous Showing results 1 — 15 out of 105 results