Speeding Up Neural Machine Translation Decoding by Shrinking Run-time Vocabulary

Xing Shi, Kevin Knight
2017 Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)  
We speed up Neural Machine Translation (NMT) decoding by shrinking run-time target vocabulary. We experiment with two shrinking approaches: Locality Sensitive Hashing (LSH) and word alignments. Using the latter method, we get a 2x overall speed-up over a highly-optimized GPU implementation, without hurting BLEU. On certain low-resource language pairs, the same methods improve BLEU by 0.5 points. We also report a negative result for LSH on GPUs, due to relatively large overhead, though it was
more » ... cessful on CPUs. Compared with Locality Sensitive Hashing (LSH), decoding with word alignments is GPU-friendly, orthogonal to existing speedup methods and more robust across language pairs.
doi:10.18653/v1/p17-2091 dblp:conf/acl/ShiK17 fatcat:dvp2majbynfrveuzw44wgvl7ky