An Improved Sarsa(λ) Reinforcement Learning Algorithm for Wireless Communication Systems

Hao Jiang, Renjie Gui, Zhen Chen, Liang Wu, Jian Dang, Jie Zhou
2019 IEEE Access  
In this article, we provide a novel improved model-free temporal-difference control algorithm, namely, Expected Sarsa(λ), using the average value as an update target and introducing eligibility traces in wireless communication networks. In particular, we construct the update target using the average action value of all possible successive actions, and apply eligibility traces to record the historical access of every state action pair, which greatly improve the model's convergence property and
more » ... arning efficiency. Numerical results demonstrate that the proposed algorithm has the advantage of high learning efficiency and a higher learning-rate tolerance range than Q Learning, Sarsa, Expected Sarsa, and Sarsa(λ) in the tabular case of a finite Markov decision process, thereby providing an efficient solution for the study and design wireless communication networks. This provides an efficient and effective solution to design further artificial intelligent communication networks. INDEX TERMS Model-free reinforcement learning, Sarsa, Q learning, eligibility traces. environmental information [5] . It does not require prior environmental information and relies only on interaction with the environment to conduct the trial-and-error process and accumulates experience to learn the optimal control policy. Currently, reinforcement learning has gained some inspiring applications, especially in terms of smart machines [6] . Recently, the framework of reinforcement learning has also been widely studied in the field of communication [7], [8]. This has included beam selection and allocation, time-domain and frequency-domain resource allocation, energy control, and scheduling of cooperative networks. The authors in [9] proposed a fast, online machine learning algorithm in fifth-generation (5G) vehicle-to-everything communication environments that verified the accuracy of the proposed algorithm based on data from Google Maps. The authors in [10] proposed a Q-learning algorithm based on a distributed reinforcement learning framework to solve the cooperative retransmission in a wireless network. This algorithm optimized the throughput per unit energy of the network by adjusting the transmission probability and energy. The authors in [11] introduced a Q-learning-based algorithm to investigate the power control problem in a cooperative 115418 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ VOLUME 7, 2019 LIANG WU received the B.S., M.S., and Ph.D. degrees from the , where he is currently an Associate Professor. His research interests include optical wireless communications, multiple input and multiple output technology, interference alignment, and wireless indoor localization. JIAN DANG (M'15) received the B.S. degree in information engineering and Ph.D. degree in information and communications engineering from Southeast University,
doi:10.1109/access.2019.2935255 fatcat:ugvxdekwjvhj7fqto3idebtyjy