A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
DeepFreeze: Towards Scalable Asynchronous Checkpointing of Deep Learning Models
2020
2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID)
In the age of big data, deep learning has emerged as a powerful tool to extract insight and exploit its value, both in industry and scientific applications. One common pattern emerging in such applications is frequent checkpointing of the state of the learning model during training, needed in a variety of scenarios: analysis of intermediate states to explain features and correlations with training data, exploration strategies involving alternative models that share a common ancestor, knowledge
doi:10.1109/ccgrid49817.2020.00-76
dblp:conf/ccgrid/NicolaeLWBDC20
fatcat:s4565nfzczhfzmk4gir3tgkt64