Convergence of Reinforcement Learning with General Function Approximators

Vassilis A. Papavassiliou, Stuart J. Russell
1999 International Joint Conference on Artificial Intelligence  
A key open problem in reinforcement learning is to assure convergence when using a compact hypothesis class to approximate the value function. Although the standard temporal-difference learning algorithm has been shown to converge when the hypothesis class is a linear combination of fixed basis functions, it may diverge with a general (nonlinear) hypothesis class. This paper describes the Bridge algorithm, a new method for reinforcement learning, and shows that it converges to an approximate
more » ... bal optimum for any agnostically learnable hypothesis class. Convergence is demonstrated on a simple example for which temporal-difference learning fails. Weak conditions are identified under which the Bridge algorithm converges for any hypothesis class. Finally, connections are made between the complexity of reinforcement learning and the PAC-learnability of the hypothesis class.
dblp:conf/ijcai/PapavassiliouR99 fatcat:46sgqekcvnf2zj47pezc6er5ry