A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Distributed Training of Deep Neural Networks with Spark: The MareNostrum Experience
2019
Pattern Recognition Letters
Deployment of a distributed deep learning technology stack on a large parallel system is a very complex process, involving the integration and configuration of several layers of both, general-purpose and custom software. The details of such kind of deployments are rarely described in the literature. This paper presents the experiences observed during the deployment of a technology stack to enable deep learning workloads on MareNostrum, a petascale supercomputer. The components of a layered
doi:10.1016/j.patrec.2019.01.020
fatcat:47iumueflfdvbdgnml6wux3qyi