Exact Formulas for Finite-Time Estimation Errors of Decentralized Temporal Difference Learning with Linear Function Approximation [article]

Xingang Guo, Bin Hu
2022 arXiv   pre-print
In this paper, we consider the policy evaluation problem in multi-agent reinforcement learning (MARL) and derive exact closed-form formulas for the finite-time mean-squared estimation errors of decentralized temporal difference (TD) learning with linear function approximation. Our analysis hinges upon the fact that the decentralized TD learning method can be viewed as a Markov jump linear system (MJLS). Then standard MJLS theory can be applied to quantify the mean and covariance matrix of the
more » ... timation error of the decentralized TD method at every time step. Various implications of our exact formulas on the algorithm performance are also discussed. An interesting finding is that under a necessary and sufficient stability condition, the mean-squared TD estimation error will converge to an exact limit at a specific exponential rate.
arXiv:2204.09801v1 fatcat:wtprk434obew3em4hqazgvh57u