56,055 Hits in 2.2 sec

Discovering Reinforcement Learning Algorithms [article]

Junhyuk Oh, Matteo Hessel, Wojciech M. Czarnecki, Zhongwen Xu, Hado van Hasselt, Satinder Singh, David Silver
2021 arXiv   pre-print
Reinforcement learning (RL) algorithms update an agent's parameters according to one of several possible rules, discovered manually through years of research.  ...  This shows the potential to discover general RL algorithms from data.  ...  Discovering Reinforcement Learning Algorithms There have been a few attempts to meta-learn RL algorithms, from earlier work on bandit algorithms [26, 25] to curiosity algorithms [1] and RL objectives  ... 
arXiv:2007.08794v3 fatcat:x4kkbyz5vzagjhzuuuanld36li

An Energy-Efficient Topology Control Algorithm Based on Reinforcement Learning for Wireless Sensor Networks

Thien T. T. Le, Sangman Moh
2017 International Journal of Control and Automation  
The proposed reinforcement-learning-based communication range control (RL-CRC) algorithm learns the varying network link characteristics using the reinforcement learning technique and gives an optimal  ...  The reinforcement learning based on the so-called Q-learning adapts to changes of node connectivity and, thus, the nodes discover their neighbors and then adaptively control their communication range accordingly  ...  Acknowledgments This paper is a revised and expanded version of a paper entitled "Reinforcement-Learning-Based Topology Control for Wireless Sensor Networks" presented at the 3rd International Conference  ... 
doi:10.14257/ijca.2017.10.5.22 fatcat:llbdq3cbczcmpbl6qd7ruz4fue

Long-Term Learning for Algorithm Control [chapter]

Tom Carchrae
2004 Lecture Notes in Computer Science  
Each algorithm is run, one after another, passing the best known solution to the next run. Reinforcement learning is used during the solving to allocate the run-time of each run of an algorithm.  ...  We wish to discover the optimal time allocation to algorithms during each iteration. So far, we have only looked at online learning so that on each problem the learning starts from scratch.  ...  Each algorithm is run, one after another, passing the best known solution to the next run. Reinforcement learning is used during the solving to allocate the run-time of each run of an algorithm.  ... 
doi:10.1007/978-3-540-30201-8_71 fatcat:6dxt7w4ejnhl7g34jklwywymdm

Reinforcement learning based local search for grouping problems: A case study on graph coloring

Yangming Zhou, Jin-Kao Hao, Béatrice Duval
2016 Expert systems with applications  
This paper makes the following contributions: we show that (1) reinforcement learning can help obtain useful information from discovered local optimum solutions; (2) the learned information can be advantageously  ...  used to guide the search algorithm towards promising regions.  ...  This paper makes the following contributions: we show that (1) reinforcement learning can help obtain useful information from discovered local optimum solutions; (2) the learned information can be advantageously  ... 
doi:10.1016/j.eswa.2016.07.047 fatcat:buezej6m55d5pi6bnmf4ola4ni

Page 129 of Journal of Cognitive Neuroscience Vol. 11, Issue 1 [page]

1999 Journal of Cognitive Neuroscience  
Another question that I would like to have seen ad- dressed is the time-scale invariance or lack thereof in reinforcement learning algorithms, particularly time-dif- ference algorithms.  ...  If so, an algorithm that did a good job of discover- ing a scheduling policy for elevators that moved at one speed would not do a good job of discovering a sched- uling policy for elevators that moved  ... 

Meta-Gradient Reinforcement Learning with an Objective Discovered Online [article]

Zhongwen Xu, Hado van Hasselt, Matteo Hessel, Junhyuk Oh, Satinder Singh, David Silver
2020 arXiv   pre-print
Deep reinforcement learning includes a broad family of algorithms that parameterise an internal representation, such as a value function or policy, by a deep neural network.  ...  We demonstrate that the algorithm discovers how to address several important issues in RL, such as bootstrapping, non-stationarity, and off-policy learning.  ...  We call this algorithm FRODO (Flexible Reinforcement Objective Discovered Online).  ... 
arXiv:2007.08433v1 fatcat:ljl2ig64rffmphbluh24zpceoq

Augmenting Policy Learning with Routines Discovered from a Single Demonstration [article]

Zelin Zhao, Chuang Gan, Jiajun Wu, Xiaoxiao Guo, Joshua B. Tenenbaum
2021 arXiv   pre-print
Our approach enables imitating expert behavior at multiple temporal scales for imitation learning and promotes reinforcement learning exploration.  ...  Extensive experiments on Atari games demonstrate that RAPL improves the state-of-the-art imitation learning method SQIL and reinforcement learning method A2C.  ...  On the other hand, we use routines to promote the standard reinforcement learning algorithm A2C. We formulate the learning targets for those two algorithms in the following paragraphs. RAPL-SQIL.  ... 
arXiv:2012.12469v4 fatcat:rqn7tgpm5jb2rcs67egr7ipzxa

A competitive approach to game learning

Christopher D. Rosin, Richard K. Belew
1996 Proceedings of the ninth annual conference on Computational learning theory - COLT '96  
algorithm to discover strategy learning algorithm to discover strong strategy for the game. strong strategy for the game.  ...    A learns game from reinforcement A learns game from reinforcement algorithm. This is strategy learning algorithm. This is strategy learning algorithm. algorithm.  ... 
doi:10.1145/238061.238153 dblp:conf/colt/RosinB96 fatcat:z5jhvthu4jdt5btqe5s2eeamfq

User Role Discovery and Optimization Method based on K-means + Reinforcement learning in Mobile Applications [article]

Yuanbang Li
2021 arXiv   pre-print
Thirdly, a reinforcement learning algorithm is proposed to strengthen the clustering effect of user roles and improve the stability of the clustering result.  ...  Secondly, K Means algorithm is used to discover user roles from user features.  ...  Therefore, based on using the K-Means algorithm to discover the user's role, this study uses the reinforcement learning method to learn the division of which role the user belongs to, which improves the  ... 
arXiv:2107.00862v1 fatcat:jxrorr67o5a3xp63ebceumns7q

A Technique for Web Page Ranking by Applying Reinforcement Learning

Vivek Deshmukh, S. S.
2016 International Journal of Computer Applications  
Reinforcement learning strategy learns from every connection with dynamic environment. In this paper Reinforcement learning (RL) ranking algorithm is proposed.  ...  Every site page is considered as a state and fundamental point is to discover score of website page.  ...  This algorithm is based on the generalization of the reinforcement learning concepts for learning the ranking functions on graphs.  ... 
doi:10.5120/ijca2016911237 fatcat:m2y3pemvfrfqrhx7zwzdnuxzx4

Contingent Features for Reinforcement Learning [chapter]

Nathan Sprague
2014 Lecture Notes in Computer Science  
Applying reinforcement learning algorithms in real-world domains is challenging because relevant state information is often embedded in a stream of high-dimensional sensor data.  ...  This paper describes a novel algorithm for learning task-relevant features through interactions with the environment.  ...  Conclusions Feature discovery for reinforcement learning presents a chicken and egg problem.  ... 
doi:10.1007/978-3-319-11179-7_44 fatcat:tnn7umf64fcttcq3bfqdrjt4du


Anshul Chaturvedi .
2016 International Journal of Research in Engineering and Technology  
Today reinforcement learning (RL) is holding the attention in research area under Machine Learning and AI.  ...  Hierarchical Reinforcement Learning (HRL) that break down the RL problem into sub-problems where solving of each sub-problem will be more powerful than solving the whole problem will be present in this  ...  Option-based HRL's Course-Scheduling Algorithm Traditional timetable scheduling system implements Reinforcement Learning algorithm.  ... 
doi:10.15623/ijret.2016.0502047 fatcat:56nuszbcbjegzn3rpoudxve75i

Financial Trading as a Game: A Deep Reinforcement Learning Approach [article]

Chien Yi Huang
2018 arXiv   pre-print
We employ a substantially small replay memory (only a few hundreds in size) compared to ones used in modern deep reinforcement learning algorithms (often millions in size.) 2.  ...  Recent advance in deep reinforcement learning provides a framework toward end-to-end training of such trading agent.  ...  Introduction In this paper we investigate the effectiveness of applying deep reinforcement learning algorithms to the financial trading domain.  ... 
arXiv:1807.02787v1 fatcat:um6gknn7sbgbnjlzinqbh5rgxu

Learning Purposeful Behaviour in the Absence of Rewards [article]

Marlos C. Machado, Michael Bowling
2016 arXiv   pre-print
In this paper we present an algorithm capable of learning purposeful behaviour in the absence of rewards.  ...  In the reinforcement learning framework, goals are encoded as reward functions that guide agent behaviour, and the sum of observed rewards provide a notion of progress.  ...  This work was supported by grants from Alberta Innovates Technology Futures and the Alberta Innovates Centre for Machine Learning (AICML).  ... 
arXiv:1605.07700v1 fatcat:6ob7d5uhnvegxntmnmlkcbzb3m

Off-Policy Reinforcement Learning for Efficient and Effective GAN Architecture Search [article]

Yuan Tian, Qin Wang, Zhiwu Huang, Wen Li, Dengxin Dai, Minghao Yang, Jun Wang, Olga Fink
2020 arXiv   pre-print
In this paper, we introduce a new reinforcement learning (RL) based neural architecture search (NAS) methodology for effective and efficient generative adversarial network (GAN) architecture search.  ...  ., CIFAR-10 and STL-10) demonstrates that the proposed method is able to discover highly competitive architectures for generally better image generation results with a considerably reduced computational  ...  The AutoGAN algorithm is based on on-policy reinforcement learning.  ... 
arXiv:2007.09180v1 fatcat:ce4zkm6mmjdejjewcvny3o6cia
« Previous Showing results 1 — 15 out of 56,055 results