A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Temporal Consistency-Based Loss Function for Both Deep Q-Networks and Deep Deterministic Policy Gradients for Continuous Actions
2021
Symmetry
Artificial intelligence (AI) techniques in power grid control and energy management in building automation require both deep Q-networks (DQNs) and deep deterministic policy gradients (DDPGs) in deep reinforcement learning (DRL) as off-policy algorithms. Most studies on improving the stability of DRL have addressed these with replay buffers and a target network using a delayed temporal difference (TD) backup, which is known for minimizing a loss function at every iteration. The loss functions
doi:10.3390/sym13122411
fatcat:77nroszv5bhj7kcl4jeha5t2am