A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is application/pdf
.
Empirical Dynamic Programming
2016
Mathematics of Operations Research
We propose empirical dynamic programming algorithms for Markov decision processes (MDPs). In these algorithms, the exact expectation in the Bellman operator in classical value iteration is replaced by an empirical estimate to get 'empirical value iteration' (EVI). Policy evaluation and policy improvement in classical policy iteration are also replaced by simulation to get 'empirical policy iteration' (EPI). Thus, these empirical dynamic programming algorithms involve iteration of a random
doi:10.1287/moor.2015.0733
fatcat:jjt6s3jbyvcxjlmhpzadbcsw5m