Hint-Based Training for Non-Autoregressive Machine Translation

Zhuohan Li, Zi Lin, Di He, Fei Tian, Tao QIN, Liwei WANG, Tie-Yan Liu
2019 Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)  
Due to the unparallelizable nature of the autoregressive factorization, AutoRegressive Translation (ART) models have to generate tokens sequentially during decoding and thus suffer from high inference latency. Non-AutoRegressive Translation (NART) models were proposed to reduce the inference time, but could only achieve inferior translation accuracy. In this paper, we proposed a novel approach to leveraging the hints from hidden states and word alignments to help the training of NART models.
more » ... results achieve significant improvement over previous NART models for the WMT14 En-De and De-En datasets and are even comparable to a strong LSTMbased ART baseline but one order of magnitude faster in inference.
doi:10.18653/v1/d19-1573 dblp:conf/emnlp/LiLHTQWL19 fatcat:weian3dfynau3byhmij2tybrxa