From Credit Assignment to Entropy Regularization: Two New Algorithms for Neural Sequence Prediction

Zihang Dai, Qizhe Xie, Eduard Hovy
2018 Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)  
In this work, we study the credit assignment problem in reward augmented maximum likelihood (RAML) learning, and establish a theoretical equivalence between the token-level counterpart of RAML and the entropy regularized reinforcement learning. Inspired by the connection, we propose two sequence prediction algorithms, one extending RAML with fine-grained credit assignment and the other improving Actor-Critic with a systematic entropy regularization. On two benchmark datasets, we show the
more » ... we show the proposed algorithms outperform RAML and Actor-Critic respectively, providing new alternatives to sequence prediction.
doi:10.18653/v1/p18-1155 dblp:conf/acl/HovyXD18 fatcat:4zs4a6npsvcuniazt7nq2nq64a