A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Srifty: Swift and Thrifty Distributed Training on the Cloud
[article]
2022
arXiv
pre-print
Finding the best VM configuration is key to achieve lower cost and higher throughput, two primary concerns in cloud-based distributed neural network (NN) training today. Optimal VM selection that meets user constraints requires efficiently navigating a large search space while controlling for the performance variance associated with sharing cloud instances and networks. In this work, we characterize this variance in the context of distributed NN training and present results of a comprehensive
arXiv:2011.14243v3
fatcat:d5tkgfipgzbrnbegqwqxdf3hoa