Policy Iterations Without Selection Property

Assale Adje
unpublished
In this paper, we propose a modified policy iterations algorithm which does not rely on the selection property. The selection property is the key argument to make improvements during policy iterations. Indeed, a new policy is computed as an optimal solution of a minimization problem. However, in some cases, it might be difficult to prove that an optimal solution exists. To overcome this issue, the new policy is computed as a guaranteed sub-optimal solution of the minimization problem. The good
more » ... hoice of the perturbation parameters preserves the advantages of the original policy iterations algorithm such as the computation of a post-fixed point at each step and the convergence to a fixed point.
doi:10.29007/9rn9 fatcat:5mmzsb3l5bcbrbjn5nmapwgcde