A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Interpolation-based Q-learning
2004
Twenty-first international conference on Machine learning - ICML '04
We consider a variant of Q-learning in continuous state spaces under the total expected discounted cost criterion combined with local function approximation methods. Provided that the function approximator satisfies certain interpolation properties, the resulting algorithm is shown to converge with probability one. The limit function is shown to satisfy a fixed point equation of the Bellman type, where the fixed point operator depends on the stationary distribution of the exploration policy and
doi:10.1145/1015330.1015445
dblp:conf/icml/SzepesvariS04
fatcat:m6g4fecxkjawfhlex6tquf5ybe