In reinforcement learning, all objective functions are not equal

Romain Laroche, Harm van Seijen
2018 International Conference on Learning Representations  
We study the learnability of value functions. We get the reward back propagation out of the way by fitting directly a deep neural network on the analytically computed optimal value function, given a chosen objective function. We show that some objective functions are easier to train than others by several magnitude orders. We observe in particular the influence of the γ parameter and the decomposition of the task into subtasks.
dblp:conf/iclr/LarocheS18 fatcat:lb2depmppnexbesgheinmzw7jq