Approximate Policy Iteration for Semi-Markov Control Revisited

Abhijit Gosavi
2011 Procedia Computer Science  
The semi-Markov decision process can be solved via reinforcement learning without generating its transition model. We briefly review the existing algorithms based on approximate policy iteration (API) for solving this problem for discounted and average reward under the infinite horizon. API techniques have attracted significant interest in the literature recently. We first present and analyze an extension of an existing API algorithm for discounted reward that can handle continuous reward
more » ... Then, we also consider its average reward counterpart, which requires an updating based on the stochastic shortest path (SSP). We study the convergence properties of the algorithm that does not require the SSP update.
doi:10.1016/j.procs.2011.08.046 fatcat:dcv6jiyq7zagpf6znvz5qngg64