A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models
[article]
2021
arXiv
pre-print
Model parallelism has become a necessity for training modern large-scale deep language models. In this work, we identify a new and orthogonal dimension from existing model parallel approaches: it is possible to perform pipeline parallelism within a single training sequence for Transformer-based language models thanks to its autoregressive property. This enables a more fine-grained pipeline compared with previous work. With this key idea, we design TeraPipe, a high-performance token-level
arXiv:2102.07988v2
fatcat:tfzfivgpwnhpdhxfq5r45aiiya