Combining Tree Search and Action Prediction for State-of-the-Art Performance in DouDiZhu

Yunsheng Zhang, Dong Yan, Bei Shi, Haobo Fu, Qiang Fu, Hang Su, Jun Zhu, Ning Chen
2021 Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence   unpublished
AlphaZero has achieved superhuman performance on various perfect-information games, such as chess, shogi and Go. However, directly applying AlphaZero to imperfect-information games (IIG) is infeasible, due to the fact that traditional MCTS methods cannot handle missing information of other players. Meanwhile, there have been several extensions of MCTS for IIGs, by implicitly or explicitly sampling a state of other players. But, due to the inability to handle private and public information well,
more » ... the performance of these methods is not satisfactory. In this paper, we extend AlphaZero to multiplayer IIGs by developing a new MCTS method, Action-Prediction MCTS (AP-MCTS). In contrast to traditional MCTS extensions for IIGs, AP-MCTS first builds the search tree based on public information, adopts the policy-value network to generalize between hidden states, and finally predicts other players' actions directly. This design bypasses the inefficiency of sampling and the difficulty of predicting the state of other players. We conduct extensive experiments on the popular 3-player poker game DouDiZhu to evaluate the performance of AP-MCTS combined with the framework AlphaZero. When playing against experienced human players, AP-MCTS achieved a 65.65\% winning rate, which is almost twice the human's winning rate. When comparing with state-of-the-art DouDiZhu AIs, the Elo rating of AP-MCTS is 50 to 200 higher than them. The ablation study shows that accurate action prediction is the key to AP-MCTS winning.
doi:10.24963/ijcai.2021/470 fatcat:jtspuvcoh5hhteq52u3m56rsdu