Nonparametric Return Distribution Approximation for Reinforcement Learning

Tetsuro Morimura, Masashi Sugiyama, Hisashi Kashima, Hirotaka Hachiya, Toshiyuki Tanaka
2010 International Conference on Machine Learning  
Standard Reinforcement Learning (RL) aims to optimize decision-making rules in terms of the expected return. However, especially for risk-management purposes, other criteria such as the expected shortfall are sometimes preferred. Here, we describe a method of approximating the distribution of returns, which allows us to derive various kinds of information about the returns. We first show that the Bellman equation, which is a recursive formula for the expected return, can be extended to the
more » ... ative return distribution. Then we derive a nonparametric return distribution estimator with particle smoothing based on this extended Bellman equation. A key aspect of the proposed algorithm is to represent the recursion relation in the extended Bellman equation by a simple replacement procedure of particles associated with a state by using those of the successor state. We show that our algorithm leads to a risksensitive RL paradigm. The usefulness of the proposed approach is demonstrated through numerical experiments.
dblp:conf/icml/MorimuraSKHT10 fatcat:hvsftuminvfhrhw4hhlas6dyjq