Generalizing Movements with Information-Theoretic Stochastic Optimal Control

Rudolf Lioutikov, Alexandros Paraschos, Jan Peters, Gerhard Neumann
2014 Journal of Aerospace Information Systems  
Stochastic Optimal Control (SOC) is typically used to plan a movement for a specific situation. While most SOC methods fail to generalize this movement plan to a new situation without re-planning, we present a SOC method that allows us to reuse the obtained policy in a new situation as the policy is more robust to slight deviations from the initial movement plan. In order to improve the robustness of the policy, we employ information-theoretic policy updates that explicitly operate on
more » ... distributions instead of single trajectories. To ensure a stable and smooth policy update, we limit the 'distance' between the trajectory distributions of the old and the new control policy. The introduced bound offers a closed form solution for the resulting policy and extends results from recent developments in SOC. In difference to many standard SOC algorithms, our approach can directly infer the system dynamics from data points, and, hence, can also be used for model-based reinforcement learning. This paper represents an extension of [16] . In addition to revisiting the content, we provide an extensive theoretical comparison our approach with related work, discuss additional aspects of the implementation and introduce further evaluations.
doi:10.2514/1.i010195 fatcat:vgowz5n5qfepzeoaeeflfoyaki