A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Nonparametric Return Distribution Approximation for Reinforcement Learning
2010
International Conference on Machine Learning
Standard Reinforcement Learning (RL) aims to optimize decision-making rules in terms of the expected return. However, especially for risk-management purposes, other criteria such as the expected shortfall are sometimes preferred. Here, we describe a method of approximating the distribution of returns, which allows us to derive various kinds of information about the returns. We first show that the Bellman equation, which is a recursive formula for the expected return, can be extended to the
dblp:conf/icml/MorimuraSKHT10
fatcat:hvsftuminvfhrhw4hhlas6dyjq