Active Reinforcement Learning: Observing Rewards at a Cost [article]

David Krueger, Jan Leike, Owain Evans, John Salvatier
2020 arXiv   pre-print
Active reinforcement learning (ARL) is a variant on reinforcement learning where the agent does not observe the reward unless it chooses to pay a query cost c > 0. The central question of ARL is how to quantify the long-term value of reward information. Even in multi-armed bandits, computing the value of this information is intractable and we have to rely on heuristics. We propose and evaluate several heuristic approaches for ARL in multi-armed bandits and (tabular) Markov decision processes,
more » ... d discuss and illustrate some challenging aspects of the ARL problem.
arXiv:2011.06709v2 fatcat:x5sce4tcavhrjjtqjybapmkeru