A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
The performance of autoregressive models on natural language generation tasks has dramatically improved due to the adoption of deep, self-attentive architectures. However, these gains have come at the cost of hindering inference speed, making state-of-the-art models cumbersome to deploy in real-world, timesensitive settings. We develop a compression technique for autoregressive models that is driven by an imitation learning perspective on knowledge distillation. The algorithm is designed todoi:10.18653/v1/2020.emnlp-main.494 fatcat:ywgz4k3ewbfinegy3i2l5mrrwe