A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2018; you can also visit the original URL.
The file type is application/pdf
.
Filters
Introduction
1996
Machine Learning
Most convergence results for TD methods rely on the assumption that the underlying environment is Markovian; the paper by Schapire and Warmuth shows that, even for environments that are arbitrarily non-Markovian ...
The problem of exploration in unknown environments is a crucial one for reinforcement learning. ...
doi:10.1007/bf00114721
fatcat:vweynsjrh5hnpdpyj7zad75i6i
Persistent Rule-based Interactive Reinforcement Learning
[article]
2021
arXiv
pre-print
Interactive reinforcement learning has allowed speeding up the learning process in autonomous agents by including a human trainer providing extra information to the agent in real-time. ...
In this work, we propose a persistent rule-based interactive reinforcement learning approach, i.e., a method for retaining and reusing provided knowledge, allowing trainers to give general advice relevant ...
Introduction Interactive reinforcement learning (IntRL) allows a trainer to guide or evaluate a learning agent's behaviour [1, 2] . ...
arXiv:2102.02441v2
fatcat:ds6myvafkbbt3c3vu5nh47ttei
An autonomous explore/exploit strategy
2005
Proceedings of the 2005 workshops on Genetic and evolutionary computation - GECCO '05
The XCS learning classifier system uses a fixed explore/exploit balance, but does keep multiple statistics about its performance and interaction in an environment. ...
In reinforcement learning problems it has been considered that neither exploitation nor exploration can be pursued exclusively without failing at the task. ...
Acknowledgements: The authors would also like to thank Jan Drugowitsch for his useful advice. ...
doi:10.1145/1102256.1102280
dblp:conf/gecco/McMahonSB05
fatcat:vuqkmebcmfgz3cqdhwq73rtkci
Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning
2018
International Conference on Machine Learning
QRM is guaranteed to converge to an optimal policy in the tabular case, in contrast to Hierarchical Reinforcement Learning methods which might converge to suboptimal policies. ...
We also show how function approximation methods like neural networks can be incorporated into QRM, and that doing so can find better policies more quickly than hierarchical methods in a domain with a continuous ...
Note that the rewards the agent gets may be non-Markovian relative to the environment (the states of S), though they are Markovian relative to the elements in S × U . ...
dblp:conf/icml/IcarteKVM18
fatcat:srjl5b44jjguhi5guh5wg5xfai
Temporal-Logic-Based Reward Shaping for Continuing Learning Tasks
[article]
2020
arXiv
pre-print
In continuing tasks, average-reward reinforcement learning may be a more appropriate problem formulation than the more common discounted reward formulation. ...
Reward shaping is a common approach for incorporating domain knowledge into reinforcement learning in order to speed up convergence to an optimal policy. ...
We are also interested in expanding the learning problem to include both non-Markovian reward structures and shaping functions. ...
arXiv:2007.01498v1
fatcat:rw22zxoaubcxxoji7lr6hknxj4
Evaluating the XCS learning classifier system in competitive simultaneous learning environments
2005
Proceedings of the 2005 workshops on Genetic and evolutionary computation - GECCO '05
We would like to evaluate the XCS [1] Learning Classifier System (LCS [2]) to see if it can be applied to a specific aviation industry problem. ...
We are interested in seeing whether it can offer an accessible representation model and evolve feasible strategies to predict future demand patterns endogenously, and in parallel with the supply side simulation ...
Given the fact that our problem is a non-Markovian problem, we would also like to test its effectiveness in dealing with a non-Markov environment. ...
doi:10.1145/1102256.1102282
dblp:conf/gecco/SoodWJ05
fatcat:t3wt36to3netdcgyzdgojlrd2y
Reinforcement Learning with Attention that Works: A Self-Supervised Approach
[article]
2019
arXiv
pre-print
We propose the first combination of self attention and reinforcement learning that is capable of producing significant improvements, including new state of the art results in the Arcade Learning Environment ...
Attention models have had a significant positive impact on deep learning across a range of tasks. ...
Acknowledgments We would like to thank Michele Sasdelli for his helpful discussions, and Damien Teney for his feed-back and advice on writing this paper. ...
arXiv:1904.03367v1
fatcat:5qlgijugpjfd3p6wzvb4m27u6y
Knowledge Based Reinforcement Learning Robot in Maze Environment
2011
International Journal of Computer Applications
It provides robots with the capability of learning to act optimally in a Markovian environment. ...
Techniques for accelerating reinforcement learning on real robots include (1) guiding exploration by human demonstration, advice or an approximate pre-installed controller (2) using replayed experiences ...
doi:10.5120/1895-2525
fatcat:g7fik3joqzfvzke2eengxj6rzu
Navigation towards a goal position: from reactive to generalised learned control
2011
Journal of Physics, Conference Series
only learn to act in the current environment, but also to generalise prior knowledge to the current environment in order to achieve the goal more quickly in a non-convex structured environment. ...
Because of the limitations presented by the Potential Fields method, especially in relation to non-convex obstacles, we are investigating the use of Relational Reinforcement Learning as a method to not ...
In this paper we investigate the use of Relational Reinforcement Learning as a method to not only learn how to navigate in the current environment, but also to generalise the acquired knowledge in order ...
doi:10.1088/1742-6596/285/1/012025
fatcat:r3nlw4lqqbd4rdyboz6a3ednji
Socially guided intrinsic motivation for robot learning of motor skills
2013
Autonomous Robots
human demonstration properties to learn how to produce varied outcomes in the environment, while developing more precise control policies in large spaces. ...
In an experiment where a robot arm has to learn to use a flexible fishing line , we illustrate that SGIM-D efficiently combines the advantages of social learning and intrinsic motivation and benefits from ...
Taking into account the non-Markovian behaviour of human beings would induce high complexity in the reinforcement learning framework. ...
doi:10.1007/s10514-013-9339-y
fatcat:n452p2dufzcl5pcwxluhax3h3q
Widening the Pipeline in Human-Guided Reinforcement Learning with Explanation and Context-Aware Data Augmentation
[article]
2021
arXiv
pre-print
In this paper, we present the first study of using human visual explanations in human-in-the-loop reinforcement learning (HRL). ...
However, this kind of human guidance was only investigated in supervised learning tasks, and it remains unclear how to best incorporate this type of human knowledge into deep reinforcement learning. ...
and a JP Morgan AI Faculty Research grant. ...
arXiv:2006.14804v5
fatcat:k6ugjnb56bhazopdmv22jsme7q
How Active Inference Could Help Revolutionise Robotics
2022
Entropy
In this paper, we explain how active inference—a well-known description of sentient behaviour from neuroscience—can be exploited in robotics. ...
In short, active inference leverages the processes thought to underwrite human behaviour to build effective autonomous systems. ...
The authors thank Areeb Mian and Sima Al-Asad for helpful input on a previous version of the manuscript.
Conflicts of Interest: The authors declare no conflict of interest. ...
doi:10.3390/e24030361
pmid:35327872
pmcid:PMC8946999
fatcat:qiyc4hul3ffjjmq5buq524mgiu
Multi-agent deep reinforcement learning: a survey
2021
Artificial Intelligence Review
AbstractThe advances in reinforcement learning have recorded sublime success in various domains. ...
We focus primarily on literature from recent years that combines deep reinforcement learning methods with a multi-agent scenario. ...
Hence, the Markov property is not fulfilled, and the environment appears non-Markovian. ...
doi:10.1007/s10462-021-09996-w
fatcat:blu4ekwaxjfo5it3y7taqnzq4a
Hierarchical Imitation and Reinforcement Learning
[article]
2018
arXiv
pre-print
We focus on problems with sparse rewards and long time horizons, which typically pose significant challenges in reinforcement learning. ...
Our framework can incorporate different combinations of imitation learning (IL) and reinforcement learning (RL) at different levels, leading to dramatic reductions in both expert effort and cost of exploration ...
HML is also supported in part by an Amazon AI Fellowship. ...
arXiv:1803.00590v2
fatcat:h3xkznuy7jdqnijohpbn7tu6ji
Learning polite behavior with situation models
2008
Proceedings of the 3rd international conference on Human robot interaction - HRI '08
In the fourth experiment we demonstrate that proper credit assignment improves the effectiveness of reinforcement learning for social interaction. ...
In this paper, we describe experiments with methods for learning the appropriateness of behaviors based on a model of the current social situation. ...
This is an early approach of the use of reinforcement learning in a complex human online social environment, where many of the standard assumptions (stationary rewards, Markovian behavior, and appropriateness ...
doi:10.1145/1349822.1349850
dblp:conf/hri/BarraquandC08
fatcat:lkqecuip6ba77khlzn67ycp2by
« Previous
Showing results 1 — 15 out of 245 results