A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is
We consider the problem of estimating the policy gradient in Partially Observable Markov Decision Processes (POMDPs) with a special class of policies that are based on Predictive State Representations (PSRs). We compare PSR policies to Finite-State Controllers (FSCs), which are considered as a standard model for policy gradient methods in POMDPs. We present a general Actor-Critic algorithm for learning both FSCs and PSR policies. The critic part computes a value function that has as variablesdoi:10.1145/1553374.1553383 dblp:conf/icml/BoulariasC09 fatcat:vzzyomydzfabjmzktulnsin5v4