Safe Exploration and Optimization of Constrained MDPs Using Gaussian Processes

Akifumi Wachi, Yanan Sui, Yisong Yue, Masahiro Ono
2018 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
We present a reinforcement learning approach to explore and optimize a safety-constrained Markov Decision Process(MDP). In this setting, the agent must maximize discounted cumulative reward while constraining the probability of entering unsafe states, defined using a safety function being within some tolerance. The safety values of all states are not known a priori, and we probabilistically model them via aGaussian Process (GP) prior. As such, properly behaving in such an environment requires
more » ... lancing a three-way trade-off of exploring the safety function, exploring the reward function, and exploiting acquired knowledge to maximize reward. We propose a novel approach to balance this trade-off. Specifically, our approach explores unvisited states selectively; that is, it prioritizes the exploration of a state if visiting that state significantly improves the knowledge on the achievable cumulative reward. Our approach relies on a novel information gain criterion based on Gaussian Process representations of the reward and safety functions. We demonstrate the effectiveness of our approach on a range of experiments, including a simulation using the real Martian terrain data.
doi:10.1609/aaai.v32i1.12103 fatcat:lfc45ua3mreubjyfjgmelis53m