Function Approximation Based Reinforcement Learning for Edge Caching in Massive MIMO Networks
IEEE Transactions on Communications
Caching popular contents in advance is an important technique to achieve low latency and reduced backhaul congestion in future wireless communication systems. In this article, a multi-cell massive multi-input-multi-output system is considered, where locations of base stations are distributed as a Poisson point process. Assuming probabilistic caching, average success probability (ASP) of the system is derived for a known content popularity (CP) profile, which in practice is time-varying and
... me-varying and unknown in advance. Further, modeling CP variations across time as a Markov process, reinforcement Q-learning is employed to learn the optimal content placement strategy to optimize the long-term-discounted ASP and average cache refresh rate. In the Q-learning, the number of Q-updates are large and proportional to the number of states and actions. To reduce the space complexity and update requirements towards scalable Q-learning, two novel (linear and non-linear) function approximations-based Q-learning approaches are proposed, where only a constant (4 and 3 respectively) number of variables need updation, irrespective of the number of states and actions. Convergence of these approximation-based approaches are analyzed. Simulations verify that these approaches converge and successfully learn the similar best content placement, which shows the successful applicability and scalability of the proposed approximated Q-learning schemes. Index Terms-Linear function approximation, massive MIMO, non-linear function approximation, Poisson point process, Q-learning, wireless edge caching.