A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Edinburgh's Submissions to the 2020 Machine Translation Efficiency Task
2020
Proceedings of the Fourth Workshop on Neural Generation and Translation
We participated in all tracks of the Workshop on Neural Generation and Translation 2020 Efficiency Shared Task: single-core CPU, multicore CPU, and GPU. At the model level, we use teacher-student training with a variety of student sizes, tie embeddings and sometimes layers, use the Simpler Simple Recurrent Unit, and introduce head pruning. On GPUs, we used 16-bit floating-point tensor cores. On CPUs, we customized 8-bit quantization and multiple processes with affinity for the multicore
doi:10.18653/v1/2020.ngt-1.26
dblp:conf/aclnmt/BogoychevGABHKF20
fatcat:eur3imxnbbcanpym3l7zewvd6a