Interpolation-based Q-learning

Csaba Szepesvári, William D. Smart
2004 Twenty-first international conference on Machine learning - ICML '04  
We consider a variant of Q-learning in continuous state spaces under the total expected discounted cost criterion combined with local function approximation methods. Provided that the function approximator satisfies certain interpolation properties, the resulting algorithm is shown to converge with probability one. The limit function is shown to satisfy a fixed point equation of the Bellman type, where the fixed point operator depends on the stationary distribution of the exploration policy and
more » ... the function approximation method. The basic algorithm is extended in several ways. In particular, a variant of the algorithm is obtained that is shown to converge in probability to the optimal Q function. Preliminary computer simulations are presented that confirm the validity of the approach.
doi:10.1145/1015330.1015445 dblp:conf/icml/SzepesvariS04 fatcat:m6g4fecxkjawfhlex6tquf5ybe