A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is
Learning optimal control policies directly on physical systems is challenging since even a single failure can lead to costly hardware damage. Most existing learning methods that guarantee safety, i.e., no failures, during exploration are limited to local optima. A notable exception is the GoSafe algorithm, which, unfortunately, cannot handle high-dimensional systems and hence cannot be applied to most real-world dynamical systems. This work proposes GoSafeOpt as the first algorithm that candoi:10.48550/arxiv.2201.09562 fatcat:paulnuq5rfajhkainfippwyska