A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Enabling Fast and Flexible Distributed Deep Learning with Programmable Switches
[article]
2022
arXiv
pre-print
Deep learning has been used in a wide range of areas and made a huge breakthrough. With the ever-increasing model size and train-ing data volume, distributed deep learning emerges which utilizes a cluster to train a model in parallel. Unfortunately, the performance is often far from linear speedup due to the communication overhead between cluster nodes. To address this challenge, this paper designs and implements Libra, a network aggregator, that utilizes in-network computation to optimize the
arXiv:2205.05243v2
fatcat:2ui7enlpkrdvpbninwrqxdyo3q