More Efficient Exploration with Symbolic Priors on Action Sequence Equivalences [article]

Toby Johnstone, Nathan Grinsztajn, Johan Ferret, Philippe Preux
2021 arXiv   pre-print
Incorporating prior knowledge in reinforcement learning algorithms is mainly an open question. Even when insights about the environment dynamics are available, reinforcement learning is traditionally used in a tabula rasa setting and must explore and learn everything from scratch. In this paper, we consider the problem of exploiting priors about action sequence equivalence: that is, when different sequences of actions produce the same effect. We propose a new local exploration strategy
more » ... n strategy calibrated to minimize collisions and maximize new state visitations. We show that this strategy can be computed at little cost, by solving a convex optimization problem. By replacing the usual epsilon-greedy strategy in a DQN, we demonstrate its potential in several environments with various dynamic structures.
arXiv:2110.10632v2 fatcat:jg6vugxkijfvhbczttdvrvieoq