A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Reducing Variance in Temporal-Difference Value Estimation via Ensemble of Deep Networks
2022
International Conference on Machine Learning
In temporal-difference reinforcement learning algorithms, variance in value estimation can cause instability and overestimation of the maximal target value. Many algorithms have been proposed to reduce overestimation, including several recent ensemble methods, however none have shown success in sample-efficient learning through addressing estimation variance as the root cause of overestimation. In this paper, we propose MeanQ, a simple ensemble method that estimates target values as ensemble
dblp:conf/icml/LiangXMHIAF22
fatcat:2frtnecai5dsrjz7tdjgltdvya