Filters








1 Hit in 2.4 sec

Varuna: Scalable, Low-cost Training of Massive Deep Learning Models [article]

Sanjith Athlur, Nitika Saran, Muthian Sivathanu, Ramachandran Ramjee, Nipun Kwatra
2021 arXiv   pre-print
Systems for training massive deep learning models (billions of parameters) today assume and require specialized "hyper-clusters": hundreds or thousands of GPUs wired with specialized high-bandwidth interconnects  ...  The code for Varuna is available at https://github.com/microsoft/varuna.  ...  A deep learning model is divided into a number of layers. Thus, a natural way to split a model among multiple GPUs is to distribute different layers among them.  ... 
arXiv:2111.04007v2 fatcat:mmmm5shpq5ey5fgfxklnrouoqe