Intelligent Model Learning Based on Variance for Bayesian Reinforcement Learning

Shuhua You, Quan Liu, Zongzhang Zhang, Hui Wang, Xiaofang Zhang
2015 2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI)  
We consider a modular method to reinforcement learning that represents uncertainty of model parameters by maintaining probability distributions over them. The algorithm we call MBDP (model-based Bayesian dynamic programming) can be decomposed into two parallel types of inference: model learning and policy learning. During learning a model, we update posterior distributions of a model over observations after taking an action in each state. During learning a policy, we solve MDPs by dynamic
more » ... mming with greedy approximation to make an agent choose behaviors which maximize return under the estimated model. Furthermore, we propose a principled method which utilizes the variance of Dirichlet distributions for determining when to learn and relearn the model. We demonstrate that MBDP can find near optimal policies with high probability by sufficient model learning and experimental results show that MBDP performs better compared with current state-of-the-art methods in reinforcement learning.
doi:10.1109/ictai.2015.37 dblp:conf/ictai/YouLZWZ15 fatcat:d4k6tjqk3reetfbhg5pqy5gn2u