Constrained optimality problem of Markov decision processes with Borel spaces and varying discount factors

Xiao Wu, Yanqiu Tang
2021 Kybernetika (Praha)  
This paper focuses on the constrained optimality of discrete-time Markov decision processes (DTMDPs) with state-dependent discount factors, Borel state and compact Borel action spaces, and possibly unbounded costs. By means of the properties of so-called occupation measures of policies and the technique of transforming the original constrained optimality problem of DT-MDPs into a convex program one, we prove the existence of an optimal randomized stationary policies under reasonable conditions.
doi:10.14736/kyb-2021-2-0295 fatcat:t44mm4q5wrbejdkgwcsv4tw3h4