Predictive representations for policy gradient in POMDPs

Abdeslam Boularias, Brahim Chaib-draa
2009 Proceedings of the 26th Annual International Conference on Machine Learning - ICML '09  
We consider the problem of estimating the policy gradient in Partially Observable Markov Decision Processes (POMDPs) with a special class of policies that are based on Predictive State Representations (PSRs). We compare PSR policies to Finite-State Controllers (FSCs), which are considered as a standard model for policy gradient methods in POMDPs. We present a general Actor-Critic algorithm for learning both FSCs and PSR policies. The critic part computes a value function that has as variables
more » ... e parameters of the policy. These latter parameters are gradually updated to maximize the value function. We show that the value function is polynomial for both FSCs and PSR policies, with a potentially smaller degree in the case of PSR policies. Therefore, the value function of a PSR policy can have less local optima than the equivalent FSC, and consequently, the gradient algorithm is more likely to converge to a global optimal solution.
doi:10.1145/1553374.1553383 dblp:conf/icml/BoulariasC09 fatcat:vzzyomydzfabjmzktulnsin5v4