Constrained Variational Policy Optimization for Safe Reinforcement Learning [article]

Zuxin Liu, Zhepeng Cen, Vladislav Isenbaev, Wei Liu, Zhiwei Steven Wu, Bo Li, Ding Zhao
2022
Safe reinforcement learning (RL) aims to learn policies that satisfy certain constraints before deploying to safety-critical applications. Primal-dual as a prevalent constrained optimization framework suffers from instability issues and lacks optimality guarantees. This paper overcomes the issues from a novel probabilistic inference perspective and proposes an Expectation-Maximization style approach to learn safe policy. We show that the safe RL problem can be decomposed to 1) a convex
more » ... ion phase with a non-parametric variational distribution and 2) a supervised learning phase. We show the unique advantages of constrained variational policy optimization by proving its optimality and policy improvement stability. A wide range of experiments on continuous robotic tasks show that the proposed method achieves significantly better performance in terms of constraint satisfaction and sample efficiency than primal-dual baselines.
doi:10.48550/arxiv.2201.11927 fatcat:q7whr25hobavteih2zygq7mkwe