Pseudo-MDPs and factored linear action models

Hengshuai Yao, Csaba Szepesvari, Bernardo Avila Pires, Xinhua Zhang
2014 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)  
In this paper we introduce the concept of pseudo-MDPs to develop abstractions. Pseudo-MDPs relax the requirement that the transition kernel has to be a probability kernel. We show that the new framework captures many existing abstractions. We also introduce the concept of factored linear action models; a special case. Again, the relation of factored linear action models and existing works are discussed. We use the general framework to develop a theory for bounding the suboptimality of policies
more » ... mality of policies derived from pseudo-MDPs. Specializing the framework, we recover existing results. We give a leastsquares approach and a constrained optimization approach of learning the factored linear model as well as efficient computation methods. We demonstrate that the constrained optimization approach gives better performance than the least-squares approach with normalization. ∞ t=0 γ t f At (X t ) X 0 = x, X t+1 ∼ P At (·|X t ), A t ∼ α(X t , ·), t = 0, 1, 2, . . . . The optimal value of state x is V * (x) = sup α V α (x), giving rise to the optimal value function V * : X → R. For these definition to make sense we need to make some further assumptions. First, for a measure µ over some measurable set W , introduce L 1 (µ) to denote the space of µ-integrable real-valued functions with domain W . Further, for a kernel P a let L 1 (P a ) = ∩ x∈X L 1 (P a (·|x)). We also let L 1 (P) = ∩ a∈A L 1 (P a ) = ∩ a∈A,x∈X L 1 (P a (·|x)). We require that for any a ∈ A, f a ∈ L 1 (P a ) and further that for any measurable set U ⊂ X , a ∈ A, P a (U |·) ∈ L 1 (P) (in particular, x → P a (U |·) must be measurable). These ensure that the expectations are well-defined. Note that L 1 (P a ) and
doi:10.1109/adprl.2014.7010633 dblp:conf/adprl/YaoSPZ14 fatcat:4l7aaf5bsrczzdlpdptb3o7xqm