245 Hits in 5.5 sec


Leslie Pack Kaelbling
1996 Machine Learning  
Most convergence results for TD methods rely on the assumption that the underlying environment is Markovian; the paper by Schapire and Warmuth shows that, even for environments that are arbitrarily non-Markovian  ...  The problem of exploration in unknown environments is a crucial one for reinforcement learning.  ... 
doi:10.1007/bf00114721 fatcat:vweynsjrh5hnpdpyj7zad75i6i

Persistent Rule-based Interactive Reinforcement Learning [article]

Adam Bignold and Francisco Cruz and Richard Dazeley and Peter Vamplew and Cameron Foale
2021 arXiv   pre-print
Interactive reinforcement learning has allowed speeding up the learning process in autonomous agents by including a human trainer providing extra information to the agent in real-time.  ...  In this work, we propose a persistent rule-based interactive reinforcement learning approach, i.e., a method for retaining and reusing provided knowledge, allowing trainers to give general advice relevant  ...  Introduction Interactive reinforcement learning (IntRL) allows a trainer to guide or evaluate a learning agent's behaviour [1, 2] .  ... 
arXiv:2102.02441v2 fatcat:ds6myvafkbbt3c3vu5nh47ttei

An autonomous explore/exploit strategy

Alex McMahon, Dan Scott, Will Browne
2005 Proceedings of the 2005 workshops on Genetic and evolutionary computation - GECCO '05  
The XCS learning classifier system uses a fixed explore/exploit balance, but does keep multiple statistics about its performance and interaction in an environment.  ...  In reinforcement learning problems it has been considered that neither exploitation nor exploration can be pursued exclusively without failing at the task.  ...  Acknowledgements: The authors would also like to thank Jan Drugowitsch for his useful advice.  ... 
doi:10.1145/1102256.1102280 dblp:conf/gecco/McMahonSB05 fatcat:vuqkmebcmfgz3cqdhwq73rtkci

Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning

Rodrigo Toro Icarte, Toryn Q. Klassen, Richard Anthony Valenzano, Sheila A. McIlraith
2018 International Conference on Machine Learning  
QRM is guaranteed to converge to an optimal policy in the tabular case, in contrast to Hierarchical Reinforcement Learning methods which might converge to suboptimal policies.  ...  We also show how function approximation methods like neural networks can be incorporated into QRM, and that doing so can find better policies more quickly than hierarchical methods in a domain with a continuous  ...  Note that the rewards the agent gets may be non-Markovian relative to the environment (the states of S), though they are Markovian relative to the elements in S × U .  ... 
dblp:conf/icml/IcarteKVM18 fatcat:srjl5b44jjguhi5guh5wg5xfai

Temporal-Logic-Based Reward Shaping for Continuing Learning Tasks [article]

Yuqian Jiang, Sudarshanan Bharadwaj, Bo Wu, Rishi Shah, Ufuk Topcu, Peter Stone
2020 arXiv   pre-print
In continuing tasks, average-reward reinforcement learning may be a more appropriate problem formulation than the more common discounted reward formulation.  ...  Reward shaping is a common approach for incorporating domain knowledge into reinforcement learning in order to speed up convergence to an optimal policy.  ...  We are also interested in expanding the learning problem to include both non-Markovian reward structures and shaping functions.  ... 
arXiv:2007.01498v1 fatcat:rw22zxoaubcxxoji7lr6hknxj4

Evaluating the XCS learning classifier system in competitive simultaneous learning environments

Neera P. Sood, Ashley G. Williams, Kenneth A. De Jong
2005 Proceedings of the 2005 workshops on Genetic and evolutionary computation - GECCO '05  
We would like to evaluate the XCS [1] Learning Classifier System (LCS [2]) to see if it can be applied to a specific aviation industry problem.  ...  We are interested in seeing whether it can offer an accessible representation model and evolve feasible strategies to predict future demand patterns endogenously, and in parallel with the supply side simulation  ...  Given the fact that our problem is a non-Markovian problem, we would also like to test its effectiveness in dealing with a non-Markov environment.  ... 
doi:10.1145/1102256.1102282 dblp:conf/gecco/SoodWJ05 fatcat:t3wt36to3netdcgyzdgojlrd2y

Reinforcement Learning with Attention that Works: A Self-Supervised Approach [article]

Anthony Manchin, Ehsan Abbasnejad, Anton van den Hengel
2019 arXiv   pre-print
We propose the first combination of self attention and reinforcement learning that is capable of producing significant improvements, including new state of the art results in the Arcade Learning Environment  ...  Attention models have had a significant positive impact on deep learning across a range of tasks.  ...  Acknowledgments We would like to thank Michele Sasdelli for his helpful discussions, and Damien Teney for his feed-back and advice on writing this paper.  ... 
arXiv:1904.03367v1 fatcat:5qlgijugpjfd3p6wzvb4m27u6y

Knowledge Based Reinforcement Learning Robot in Maze Environment

D. Venkata Vara Prasad, Chitra Devi. J, Karpagam. P, Manju Priyadharsini. D
2011 International Journal of Computer Applications  
It provides robots with the capability of learning to act optimally in a Markovian environment.  ...  Techniques for accelerating reinforcement learning on real robots include (1) guiding exploration by human demonstration, advice or an approximate pre-installed controller (2) using replayed experiences  ... 
doi:10.5120/1895-2525 fatcat:g7fik3joqzfvzke2eengxj6rzu

Navigation towards a goal position: from reactive to generalised learned control

Valdinei Freire da Silva, Antonio Henrique Selvatici, Anna Helena Reali Costa
2011 Journal of Physics, Conference Series  
only learn to act in the current environment, but also to generalise prior knowledge to the current environment in order to achieve the goal more quickly in a non-convex structured environment.  ...  Because of the limitations presented by the Potential Fields method, especially in relation to non-convex obstacles, we are investigating the use of Relational Reinforcement Learning as a method to not  ...  In this paper we investigate the use of Relational Reinforcement Learning as a method to not only learn how to navigate in the current environment, but also to generalise the acquired knowledge in order  ... 
doi:10.1088/1742-6596/285/1/012025 fatcat:r3nlw4lqqbd4rdyboz6a3ednji

Socially guided intrinsic motivation for robot learning of motor skills

Sao Mai Nguyen, Pierre-Yves Oudeyer
2013 Autonomous Robots  
human demonstration properties to learn how to produce varied outcomes in the environment, while developing more precise control policies in large spaces.  ...  In an experiment where a robot arm has to learn to use a flexible fishing line , we illustrate that SGIM-D efficiently combines the advantages of social learning and intrinsic motivation and benefits from  ...  Taking into account the non-Markovian behaviour of human beings would induce high complexity in the reinforcement learning framework.  ... 
doi:10.1007/s10514-013-9339-y fatcat:n452p2dufzcl5pcwxluhax3h3q

Widening the Pipeline in Human-Guided Reinforcement Learning with Explanation and Context-Aware Data Augmentation [article]

Lin Guan, Mudit Verma, Sihang Guo, Ruohan Zhang, Subbarao Kambhampati
2021 arXiv   pre-print
In this paper, we present the first study of using human visual explanations in human-in-the-loop reinforcement learning (HRL).  ...  However, this kind of human guidance was only investigated in supervised learning tasks, and it remains unclear how to best incorporate this type of human knowledge into deep reinforcement learning.  ...  and a JP Morgan AI Faculty Research grant.  ... 
arXiv:2006.14804v5 fatcat:k6ugjnb56bhazopdmv22jsme7q

How Active Inference Could Help Revolutionise Robotics

Lancelot Da Costa, Pablo Lanillos, Noor Sajid, Karl Friston, Shujhat Khan
2022 Entropy  
In this paper, we explain how active inference—a well-known description of sentient behaviour from neuroscience—can be exploited in robotics.  ...  In short, active inference leverages the processes thought to underwrite human behaviour to build effective autonomous systems.  ...  The authors thank Areeb Mian and Sima Al-Asad for helpful input on a previous version of the manuscript. Conflicts of Interest: The authors declare no conflict of interest.  ... 
doi:10.3390/e24030361 pmid:35327872 pmcid:PMC8946999 fatcat:qiyc4hul3ffjjmq5buq524mgiu

Multi-agent deep reinforcement learning: a survey

Sven Gronauer, Klaus Diepold
2021 Artificial Intelligence Review  
AbstractThe advances in reinforcement learning have recorded sublime success in various domains.  ...  We focus primarily on literature from recent years that combines deep reinforcement learning methods with a multi-agent scenario.  ...  Hence, the Markov property is not fulfilled, and the environment appears non-Markovian.  ... 
doi:10.1007/s10462-021-09996-w fatcat:blu4ekwaxjfo5it3y7taqnzq4a

Hierarchical Imitation and Reinforcement Learning [article]

Hoang M. Le, Nan Jiang, Alekh Agarwal, Miroslav Dudík, Yisong Yue, Hal Daumé III
2018 arXiv   pre-print
We focus on problems with sparse rewards and long time horizons, which typically pose significant challenges in reinforcement learning.  ...  Our framework can incorporate different combinations of imitation learning (IL) and reinforcement learning (RL) at different levels, leading to dramatic reductions in both expert effort and cost of exploration  ...  HML is also supported in part by an Amazon AI Fellowship.  ... 
arXiv:1803.00590v2 fatcat:h3xkznuy7jdqnijohpbn7tu6ji

Learning polite behavior with situation models

Rémi Barraquand, James L. Crowley
2008 Proceedings of the 3rd international conference on Human robot interaction - HRI '08  
In the fourth experiment we demonstrate that proper credit assignment improves the effectiveness of reinforcement learning for social interaction.  ...  In this paper, we describe experiments with methods for learning the appropriateness of behaviors based on a model of the current social situation.  ...  This is an early approach of the use of reinforcement learning in a complex human online social environment, where many of the standard assumptions (stationary rewards, Markovian behavior, and appropriateness  ... 
doi:10.1145/1349822.1349850 dblp:conf/hri/BarraquandC08 fatcat:lkqecuip6ba77khlzn67ycp2by
« Previous Showing results 1 — 15 out of 245 results