Bisimulation Metrics for Continuous Markov Decision Processes

Norm Ferns, Prakash Panangaden, Doina Precup
2011 SIAM journal on computing (Print)  
In recent years, various metrics have been developed for measuring the behavioural similarity of states in probabilistic transition systems [Desharnais et al.In the context of finite Markov decision processes, we have built on these metrics to provide a robust quantitative analogue of stochastic bisimulation [Ferns et al., Proceedings of UAI, (2004), pp. 162-169] and an efficient algorithm for its calculation [Ferns et al., Proceedings of UAI (2006), pp.174-181]. In this paper, we seek to
more » ... ly extend these bisimulation metrics to Markov decision processes with continuous state spaces. In particular, we provide the first distance-estimation scheme for metrics based on bisimulation for continuous probabilistic transition systems. Our work, based on statistical sampling and infinite dimensional linear programming is a crucial first step in formally guiding real-world planning, where tasks are usually continuous and highly stochastic in nature, e.g. robot navigation, and often a substitution with a parametric model or crude finite approximation must be made. We show that the optimal value function associated with a discounted infinite-horizon planning task is continuous with respect to metric distances. Thus, our metrics allow one to reason about the quality of solution obtained by replacing one model with another. Alternatively, they may potentially be used directly for state aggregation. An earlier version of this work appears in the doctoral thesis of Norm Ferns [McGill University, (2008)].
doi:10.1137/10080484x fatcat:kz73ozpu4ne6pkcukb37nacwua