A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is
We study the learnability of value functions. We get the reward back propagation out of the way by fitting directly a deep neural network on the analytically computed optimal value function, given a chosen objective function. We show that some objective functions are easier to train than others by several magnitude orders. We observe in particular the influence of the γ parameter and the decomposition of the task into subtasks.dblp:conf/iclr/LarocheS18 fatcat:lb2depmppnexbesgheinmzw7jq