Efficient Exploration in Reinforcement Learning through Time-Based Representations

Marlos Cholodovskis Machado
In the reinforcement learning (RL) problem an agent must learn how to act optimally through trial-and-error interactions with a complex, unknown, stochastic environment. The actions taken by the agent influence not just the immediate reward it observes but also the future states and rewards it will observe, implicitly requiring the agent to deal with the trade-off between short-term and long-term consequences. In this context, the problem of exploration is the 1 The Devil to Pay in the
more » ... , 1963. Translated from Portuguese by James L. Taylor and Harriet de Onís. viii
doi:10.7939/r3-zaq3-vs36 fatcat:ncn5ppb4kbbbrmulupi6x34cpa