A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
Varuna: Scalable, Low-cost Training of Massive Deep Learning Models
[article]
2021
arXiv
pre-print
Systems for training massive deep learning models (billions of parameters) today assume and require specialized "hyper-clusters": hundreds or thousands of GPUs wired with specialized high-bandwidth interconnects ...
The code for Varuna is available at https://github.com/microsoft/varuna. ...
A deep learning model is divided into a number of layers. Thus, a natural way to split a model among multiple GPUs is to distribute different layers among them. ...
arXiv:2111.04007v2
fatcat:mmmm5shpq5ey5fgfxklnrouoqe