Selectively decentralized Q-learning

Thanh Nguyen, Snehasis Mukhopadhyay
2017 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC)  
Decentralization has been widely applied in Q-learning [8] , which is one of the most well-known model-free-techniques for learning in unknown environment [9] [10] [11] [12] [13] , especially when the systems are naturally distributed [14, 15] . In Q-learning, the learning agent maintains the optimal values for all state-action entries in its Q-table. In each state, the learning agent chooses the action by the highest Q-table entry for the state. After each visit, the learning agent updates the
more » ... g agent updates the former state-action Q-value by the new state's reward and highest Q-value. Most of the decentralized Q-learning approaches adapt the partial communication idea: each subsystem manages its own communication and updates its own Q-table. Although decentralized Q-learning has been well-established, there are still two open questions in this approach. First, how well does decentralized Q-learning tackle the slow rate of converging weakness in Q-learning [16]? Second, how do we apply multimodel-switching in decentralized Q-learning, which has not been thoroughly explored? is difficult to derive the close-form solution.
doi:10.1109/smc.2017.8122624 dblp:conf/smc/NguyenM17 fatcat:6tkaunvhrbh7zlw7taigpft5o4