A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Reducing Entropy Overestimation in Soft Actor Critic Using Dual Policy Network
2021
Wireless Communications and Mobile Computing
In reinforcement learning (RL), an agent learns an environment through hit and trail. This behavior allows the agent to learn in complex and difficult environments. In RL, the agent normally learns the given environment by exploring or exploiting. Most of the algorithms suffer from under exploration in the latter stage of the episodes. Recently, an off-policy algorithm called soft actor critic (SAC) is proposed that overcomes this problem by maximizing entropy as it learns the environment. In
doi:10.1155/2021/9920591
fatcat:tx2whyke3zb4fbg64g4l27wwiy