A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is
In this paper, we propose a modified policy iterations algorithm which does not rely on the selection property. The selection property is the key argument to make improvements during policy iterations. Indeed, a new policy is computed as an optimal solution of a minimization problem. However, in some cases, it might be difficult to prove that an optimal solution exists. To overcome this issue, the new policy is computed as a guaranteed sub-optimal solution of the minimization problem. The gooddoi:10.29007/9rn9 fatcat:5mmzsb3l5bcbrbjn5nmapwgcde