A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is
Systems for training massive deep learning models (billions of parameters) today assume and require specialized "hyper-clusters": hundreds or thousands of GPUs wired with specialized high-bandwidth interconnects ... The code for Varuna is available at https://github.com/microsoft/varuna. ... A deep learning model is divided into a number of layers. Thus, a natural way to split a model among multiple GPUs is to distribute different layers among them. ...arXiv:2111.04007v2 fatcat:mmmm5shpq5ey5fgfxklnrouoqe