A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is
Two popular approaches to model-free continuous control tasks are SAC and TD3. At first glance these approaches seem rather different; SAC aims to solve the entropy-augmented MDP by minimising the KL-divergence between a stochastic proposal policy and a hypotheical energy-basd soft Q-function policy, whereas TD3 is derived from DPG, which uses a deterministic policy to perform policy gradient ascent along the value function. In reality, both approaches are remarkably similar, and belong to aarXiv:2101.11331v2 fatcat:swv6qbihkbguxdakt5snz5xrnm