Filters








3,553 Hits in 11.1 sec

XCSF with tile coding in discontinuous action-value landscapes

Pier Luca Lanzi, Daniele Loiacono
2015 Evolutionary Intelligence  
Tile coding is an effective reinforcement learning method that uses a rather ingenious generalization mechanism based on (i) a carefully designed parameter setting and (ii) the assumption that nearby states  ...  Our comparison was based on a set of well-known reinforcement learning environments (2D Gridworld and the Mountain Car) that involved no action-value discontinuities and so posed no challenge to tabular  ...  Acknowledgments The authors wish to thank the reviewers for their invaluable comments and suggestions regarding possible extensions of the approach using a more competent genetic algorithm.  ... 
doi:10.1007/s12065-015-0129-7 fatcat:jlrbw2xgnbfo3cnxpz6c36rtki

Learning to Play Pong using Policy Gradient Learning [article]

Somnuk Phon-Amnuaisuk
2018 arXiv   pre-print
Activities in reinforcement learning (RL) revolve around learning the Markov decision process (MDP) model, in particular, the following parameters: state values, V; state-action values, Q; and policy,  ...  To get around this, the RL problem is commonly formulated to learn a specific task using hand-crafted input features to curb the size of the array.  ...  Policy gradient learning differs from the tabular state values learning or the tabular state-action values learning approach since tabular approach updates its table entry by entry.  ... 
arXiv:1807.08452v1 fatcat:oa6tigear5ebdmot2gccdydf4a

LEARNING IN A DYNAMIC ENVIRONMENT [chapter]

1995 Problem Solving in a Dynamic Environment  
A key role is played by the ability to generalise from learning experiences, and this study attempts to combine reinforcement learning with neural networks to this end.  ...  This environment demonstrates the benefits to a table-based learner of short-term memory, a generalising state representation and planning with a Dyna architecture.  ...  from similar states, and as such can be seen as taking a step up the heirarchy of cognitive capabilities.  ... 
doi:10.1142/9789812831507_0008 fatcat:jklqlbu6knh6ndp2wh2u4yjmxi

USAK METHOD FOR THE REINFORCEMENT LEARNING

Mykhailo Novotarskyi, Valentin Kuzmich
2020 Information, Computing and Intelligent systems  
In the field of reinforcement learning, tabular methods have become widespread. There are many important scientific results, which significantly improve their performance in specific applications.  ...  However, the application of tabular methods is limited due to the large amount of resources required to store value functions in tabular form under high-dimensional state spaces.  ...  From this graph we can conclude that the speed of learning has no critical dependence on the dimensionality of the state speed.  ... 
doi:10.20535/2708-4930.1.2020.216042 fatcat:citwm63udnaslfa7hr7jvq2sp4

Solving reward-collecting problems with UAVs: a comparison of online optimization and Q-learning [article]

Yixuan Liu and Chrysafis Vogiatzis and Ruriko Yoshida and Erich Morman
2021 arXiv   pre-print
We present a comparison of three methods to solve this problem: namely we implement a Deep Q-Learning model, an ε-greedy tabular Q-Learning model, and an online optimization framework.  ...  Uncrewed autonomous vehicles (UAVs) have made significant contributions to reconnaissance and surveillance missions in past US military campaigns.  ...  Erich Morman: Modeled and implemented the ε-greedy tabular Q-Learning. Additionally conducted computational experiments using -Q learning.  ... 
arXiv:2112.00141v1 fatcat:motshvd4qrfvphe2hnoppyib4q

Student/Teacher Advising through Reward Augmentation [article]

Cameron Reid
2020 arXiv   pre-print
Transfer learning is an important new subfield of multiagent reinforcement learning that aims to help an agent learn about a problem by using knowledge that it has gained solving another problem, or by  ...  However, that approach requires that learning from a teacher be treated differently from learning in every other reinforcement learning context.  ...  I've shown that using this approach can significantly speed up learning.  ... 
arXiv:2002.02938v1 fatcat:jczqa3yoffantforknapdz27f4

The LoCA Regret: A Consistent Metric to Evaluate Model-Based Behavior in Reinforcement Learning [article]

Harm van Seijen and Hadi Nekoei and Evan Racah and Sarath Chandar
2020 arXiv   pre-print
Deep model-based Reinforcement Learning (RL) has the potential to substantially improve the sample-efficiency of deep RL.  ...  We use our setup to evaluate the model-based behavior of MuZero on a variation of the classic Mountain Car task.  ...  Acknowledgments and Disclosure of Funding We would like to acknowledge Compute Canada and Calcul Quebec for providing computing resources used in this work.  ... 
arXiv:2007.03158v2 fatcat:4jaxhzmxubdk5oilrgte6z4cfe

Scaling Model-Based Average-Reward Reinforcement Learning for Product Delivery [chapter]

Scott Proper, Prasad Tadepalli
2006 Lecture Notes in Computer Science  
Reinforcement learning in real-world domains suffers from three curses of dimensionality: explosions in state and action spaces, and high stochasticity.  ...  To handle the state-space explosion, we introduce "tabular linear functions" that generalize tile-coding and linear value functions.  ...  The authors would like to thank Hong Tang for his initial experiments and code, and Alan Fern, Neville Mehta, Jason Tracy, and Rasaratnam Logendran for many useful discussions.  ... 
doi:10.1007/11871842_74 fatcat:u4v7pizu2rabpfht2vgypgm2he

Learning to drive from a world on rails [article]

Dian Chen, Vladlen Koltun, Philipp Krähenbühl
2021 arXiv   pre-print
Our approach computes action-values for each training trajectory using a tabular dynamic-programming evaluation of the Bellman equations; these action-values in turn supervise the final vision-based driving  ...  Our method is also an order of magnitude more sample-efficient than state-of-the-art model-free reinforcement learning techniques on navigational tasks in the ProcGen benchmark.  ...  This work was supported by the NSF Institute for Foundations of Machine Learning and NSF award #1845485.  ... 
arXiv:2105.00636v3 fatcat:j5luomy7sfchri6f5ksoseojo4

Gradient Descent Methods in Learning Classifier Systems: Improving XCS Performance in Multistep Problems

M.V. Butz, D.E. Goldberg, P.L. Lanzi
2005 IEEE Transactions on Evolutionary Computation  
Additionally, the extension to gradient methods highlights the relation of XCS to other function approximation methods in reinforcement learning.  ...  Until now, the temporal difference learning technique in XCS was based on deterministic updates.  ...  Sastry for their help and the useful discussions. P. L. Lanzi wishes to thank M. Colombetti and S. Ceri for invaluable support; M. Butz and P. L. Lanzi also wish to thank D. E.  ... 
doi:10.1109/tevc.2005.850265 fatcat:uihigt5hgfdbxcxilsqglgcuzy

Leveraging Human Knowledge in Tabular Reinforcement Learning: A Study of Human Subjects

Ariel Rosenfeld, Matthew E. Taylor, Sarit Kraus
2017 Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence  
up tabular RL.  ...  Reinforcement Learning (RL) can be extremely effective in solving complex, real-world problems.  ...  It has also taken place at the Intelligent Robot Learning (IRL) Lab, which is supported in part by NASA NNX16CD07C, NSF IIS-1149917, NSF IIS-1643614, and USDA 2014-67021-22174.  ... 
doi:10.24963/ijcai.2017/534 dblp:conf/ijcai/RosenfeldTK17 fatcat:43tmosczu5a2hhord6oo6gtike

Deep Reinforcement Learning for 5G Networks: Joint Beamforming, Power Control, and Interference Coordination [article]

Faris B. Mismar and Brian L. Evans and Ahmed Alkhateeb
2019 arXiv   pre-print
deep reinforcement learning.  ...  By using the greedy nature of deep Q-learning to estimate future rewards of actions and using the reported coordinates of the users served by the network, we propose an algorithm for voice bearers and  ...  Despite the finite size of the states and action space, tabular Q-learning is slow to converge is because its convergence requires the state-action pairs to be sampled infinitely often [29] , [33] .  ... 
arXiv:1907.00123v3 fatcat:vfnvdggennamth5g622uee5bee

Adaptive Agents in Minecraft: A Hybrid Paradigm for Combining Domain Knowledge with Reinforcement Learning [chapter]

Priyam Parashar, Bradley Sheneman, Ashok K. Goel
2017 Lecture Notes in Computer Science  
We present a pilot study focused on creating flexible Hierarchical Task Networks which can leverage Reinforcement Learning to repair and adapt incomplete plans in the simulated rich domain of Minecraft  ...  The main aim of our study is to create flexible knowledge-based planners for robots, which can leverage exploration and guide learning more efficiently by imparting structure using domain knowledge.  ...  Reinforcement Learning: Q-learning We have implemented a tabular form of Q-learning for our reinforcement learning purposes in this paper, using the following update formula. s denotes a state from the  ... 
doi:10.1007/978-3-319-71679-4_6 fatcat:i6qfjjl6arejlearxsm5uxwfmq

RL^2: Fast Reinforcement Learning via Slow Reinforcement Learning [article]

Yan Duan, John Schulman, Xi Chen, Peter L. Bartlett, Ilya Sutskever, Pieter Abbeel
2016 arXiv   pre-print
Deep reinforcement learning (deep RL) has been successful in learning sophisticated behaviors automatically; however, the learning process requires a huge number of trials.  ...  Rather than designing a "fast" reinforcement learning algorithm, we propose to represent it as a recurrent neural network (RNN) and learn it from data.  ...  RELATED WORK The concept of using prior experience to speed up reinforcement learning algorithms has been explored in the past in various forms.  ... 
arXiv:1611.02779v2 fatcat:5uies6uzlnhwpdmjwx3ofnz4oq

Improving Reinforcement Learning Speed for Robot Control

Laetitia Matignon, Guillaume Laurent, Nadine Fort-piat
2006 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems  
Reinforcement Learning (RL) is an intuitive way of programming well-suited for use on autonomous robots because it does not need to specify how the task has to be achieved.  ...  In this paper, we develop a theoretical study of the influence of some RL parameters over the learning speed.  ...  To implement the tabular Q-learning, we have chosen a two-dimensional state space x = (θ, ω). 30 × 30 × 9 bases were used for the state-action space (θ, ω, u).  ... 
doi:10.1109/iros.2006.282341 dblp:conf/iros/MatignonLF06 fatcat:xnhnia3uyzdsrofkmiuseaysia
« Previous Showing results 1 — 15 out of 3,553 results