A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Filters
On the Acceleration of Deep Learning Model Parallelism with Staleness
[article]
2022
arXiv
pre-print
In this paper, we propose Layer-wise Staleness and a novel efficient training algorithm, Diversely Stale Parameters (DSP), to address these challenges. ...
We also analyze the convergence of DSP with two popular gradient-based methods and prove that both of them are guaranteed to converge to critical points for non-convex problems. ...
Based on the concept of Layer-wise Staleness, we propose a novel parallel CNN training algorithm named as Diversely Stale Parameters (DSP), where lower layers use more stale information to update parameters ...
arXiv:1909.02625v3
fatcat:3go6lc3u7fdfxlu4tcbxvpu6ea
On the Acceleration of Deep Learning Model Parallelism With Staleness
2020
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
In this paper, we propose Layer-wise Staleness and a novel efficient training algorithm, Diversely Stale Parameters (DSP), to address these challenges. ...
We also analyze the convergence of DSP with two popular gradient-based methods and prove that both of them are guaranteed to converge to critical points for non-convex problems. ...
Based on the concept of Layer-wise Staleness, we propose a novel parallel CNN training algorithm named as Diversely Stale Parameters (DSP), where lower layers use more stale information to update parameters ...
doi:10.1109/cvpr42600.2020.00216
dblp:conf/cvpr/XuHH20
fatcat:t2yhanindbdmraiblkskqygxai
Omnivore: An Optimizer for Multi-device Deep Learning on CPUs and GPUs
[article]
2016
arXiv
pre-print
For our third contribution, we use our novel understanding of the interaction between system and optimization dynamics to provide an efficient hyperparameter optimizer. ...
Given a specification of a convolutional neural network, our goal is to minimize the time to train this model on a cluster of commodity CPUs and GPUs. ...
For each staleness, the algorithm searches for optimal settings of momentum and learning rate by running each configuration of parameters for 1 minute, and selecting the parameters with the lowest loss ...
arXiv:1606.04487v4
fatcat:qwr4xcuz45d3hoj3kxjius3q6y
Toward Understanding the Impact of Staleness in Distributed Machine Learning
[article]
2018
arXiv
pre-print
Our extensive experiments reveal the rich diversity of the effects of staleness on the convergence of ML algorithms and offer insights into seemingly contradictory reports in the literature. ...
Many distributed machine learning (ML) systems adopt the non-synchronous execution in order to alleviate the network communication bottleneck, resulting in stale parameters that do not reflect the latest ...
We study the impact of staleness on a diverse set of models: Convolutional Neural Networks (CNNs), Deep Neural Networks (DNNs), multi-class Logistic Regression (MLR), Matrix Factorization (MF), Latent ...
arXiv:1810.03264v1
fatcat:drjb6zxycnawpgd7j62r4d7gbu
GaDei: On Scale-up Training As A Service For Deep Learning
[article]
2017
arXiv
pre-print
The unique challenge of TaaS is that it must satisfy a wide range of customers who have no experience and resources to tune DL hyper-parameters, and meticulous tuning for each user's dataset is prohibitively ...
Deep learning (DL) training-as-a-service (TaaS) is an important emerging industrial workload. ...
GaDei enables efficient multi-learner training for arbitrary type of neural networks (e.g., CNN, RNN). We further verify the correctness of the GaDei's communication protocol. ...
arXiv:1611.06213v2
fatcat:esgbeep4hnebhjgykh66mb5zay
Blockchain-Enabled Asynchronous Federated Learning in Edge Computing
2021
Sensors
However, the diversity of computing power and data sizes leads to a significant difference in local training data consumption, and thereby causes the inefficiency of FL. ...
However, privacy issues during data collection for ML tasks raise extensive concerns. ...
Acknowledgments: We sincerely thank you Lei Cui and Yufeng Xing for contributing their time and efforts to improve this paper.
Conflicts of Interest: The authors declare no conflict of interest. ...
doi:10.3390/s21103335
pmid:34064942
fatcat:krb6q5lcnrgwdezk3hgwhh4iai
Data-Aware Device Scheduling for Federated Edge Learning
[article]
2022
arXiv
pre-print
Therefore, a careful scheduling of a subset of devices for training and uploading models is necessary. ...
We first define a general framework for the data-aware scheduling and the main axes and requirements for diversity evaluation. ...
ACKNOWLEDGEMENT The authors would like to thank the Natural Sciences and Engineering Research Council of Canada, for the financial support of this research. ...
arXiv:2102.09491v2
fatcat:tooxdxbrazg63iedie4kualsbm
BenchENAS: A Benchmarking Platform for Evolutionary Neural Architecture Search
[article]
2021
arXiv
pre-print
Unfortunately, the issues of fair comparisons and efficient evaluations have hindered the development of ENAS. ...
The existing efficient evaluation methods are either not suitable for the population-based ENAS algorithm or are too complex to use. ...
Every node hosts a copy of the DNN and trains the DNN on a subset of the dataset as shown in Fig. 1 . Then, the values of the parameters are sent to the parameter server. ...
arXiv:2108.03856v2
fatcat:w5aa2taebzcsffi5iyhcxbx3qm
FederatedScope: A Flexible Federated Learning Platform for Heterogeneity
[article]
2022
arXiv
pre-print
Towards an easy-to-use and flexible platform, FederatedScope enables rich types of plug-in operations and components for efficient further development, and we have implemented several important components ...
training strategies. ...
With FL, a CNN with two convolutional layers is trained for image classification task on this dataset. CIFAR-10. ...
arXiv:2204.05011v4
fatcat:t5dba2hz7jbplgmnnoiqhlnjfy
Zero-Shot Temporal Action Detection via Vision-Language Prompting
[article]
2022
arXiv
pre-print
Collecting and annotating a large training set for each class of interest is costly and hence unscalable. ...
The PyTorch implementation of STALE is available at https://github.com/sauradip/STALE. ...
with fewer extra FLOPs and parameters. ...
arXiv:2207.08184v1
fatcat:5op7uk6gqzcvzilrcbyczqvqr4
Scalable Deep Learning on Distributed Infrastructures: Challenges, Techniques and Tools
[article]
2019
arXiv
pre-print
This incorporates infrastructures for DL, methods for parallel DL training, multi-tenant resource scheduling and the management of training and model data. ...
One of the reasons for this success is the increasing size of DL models and the proliferation of vast amounts of training data being available. ...
They train two CNNs: one of the CNNs predicts the label while the other CNNs predicts the noise type of the training data set. For training, they first pre-train both CNNs with clean training data. ...
arXiv:1903.11314v2
fatcat:y62z7mteyzeq5kenb7srwtlg7q
AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training
[article]
2017
arXiv
pre-print
Highly distributed training of Deep Neural Networks (DNNs) on future compute platforms (offering 100 of TeraOps/s of computational capacity) is expected to be severely communication constrained. ...
(SGD with momentum, Adam) and network parameters (number of learners, minibatch-size etc.). ...
Acknowledgments The authors would like to thank Naigang Wang, Vijayalakshmi Srinivasan, Swagath Venkataramani, Pritish Narayanan, and I-Hsin Chung for helpful discussions and supports. ...
arXiv:1712.02679v1
fatcat:xjfy7efqmrctfa6dnaqvp5qhya
Towards Efficient and Stable K-Asynchronous Federated Learning with Unbounded Stale Gradients on Non-IID Data
2022
IEEE Transactions on Parallel and Distributed Systems
We also present the convergence analysis for WKAFL under the assumption of unbounded staleness to understand the impact of staleness and non-IID data. ...
By selecting consistent gradients and adjusting learning rate adaptively, WKAFL utilizes stale gradients and mitigates the impact of non-IID data, which can achieve multifaceted enhancement in training ...
For EMNIST MNIST, the same CNN model for EMNIST ByClass's is deployed. For CIFAR10, we adopted LeNet [54] , a simple and classical CNN model. ...
doi:10.1109/tpds.2022.3150579
fatcat:r7dp3afyl5htdknvoa33pabeky
AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training
2018
PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE
Highly distributed training of Deep Neural Networks (DNNs) on future compute platforms (offering 100 of TeraOps/s of computational capacity) is expected to be severely communication constrained. ...
(SGD with momentum, Adam) and network parameters (number of learners, minibatch-size etc.). ...
Acknowledgments The authors would like to thank Naigang Wang, Vijayalakshmi Srinivasan, Swagath Venkataramani, Pritish Narayanan, and I-Hsin Chung for helpful discussions and supports. ...
doi:10.1609/aaai.v32i1.11728
fatcat:4uogjg4vcbazvmvvge4ntdppk4
Asynchronous Federated Optimization
[article]
2020
arXiv
pre-print
We prove that the proposed approach has near-linear convergence to a global optimum, for both strongly convex and a restricted family of non-convex problems. ...
Federated learning enables training on a massive number of edge devices. To improve flexibility and scalability, we propose a new asynchronous federated optimization algorithm. ...
Ideally, the larger amounts of training data from diverse users results in improved representation and generalization of machine-learning models. ...
arXiv:1903.03934v5
fatcat:wzti5vo4dne2pdnara5rgvwsvi
« Previous
Showing results 1 — 15 out of 277 results