Filters








277 Hits in 4.3 sec

On the Acceleration of Deep Learning Model Parallelism with Staleness [article]

An Xu, Zhouyuan Huo, Heng Huang
2022 arXiv   pre-print
In this paper, we propose Layer-wise Staleness and a novel efficient training algorithm, Diversely Stale Parameters (DSP), to address these challenges.  ...  We also analyze the convergence of DSP with two popular gradient-based methods and prove that both of them are guaranteed to converge to critical points for non-convex problems.  ...  Based on the concept of Layer-wise Staleness, we propose a novel parallel CNN training algorithm named as Diversely Stale Parameters (DSP), where lower layers use more stale information to update parameters  ... 
arXiv:1909.02625v3 fatcat:3go6lc3u7fdfxlu4tcbxvpu6ea

On the Acceleration of Deep Learning Model Parallelism With Staleness

An Xu, Zhouyuan Huo, Heng Huang
2020 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)  
In this paper, we propose Layer-wise Staleness and a novel efficient training algorithm, Diversely Stale Parameters (DSP), to address these challenges.  ...  We also analyze the convergence of DSP with two popular gradient-based methods and prove that both of them are guaranteed to converge to critical points for non-convex problems.  ...  Based on the concept of Layer-wise Staleness, we propose a novel parallel CNN training algorithm named as Diversely Stale Parameters (DSP), where lower layers use more stale information to update parameters  ... 
doi:10.1109/cvpr42600.2020.00216 dblp:conf/cvpr/XuHH20 fatcat:t2yhanindbdmraiblkskqygxai

Omnivore: An Optimizer for Multi-device Deep Learning on CPUs and GPUs [article]

Stefan Hadjis, Ce Zhang, Ioannis Mitliagkas, Dan Iter, Christopher Ré
2016 arXiv   pre-print
For our third contribution, we use our novel understanding of the interaction between system and optimization dynamics to provide an efficient hyperparameter optimizer.  ...  Given a specification of a convolutional neural network, our goal is to minimize the time to train this model on a cluster of commodity CPUs and GPUs.  ...  For each staleness, the algorithm searches for optimal settings of momentum and learning rate by running each configuration of parameters for 1 minute, and selecting the parameters with the lowest loss  ... 
arXiv:1606.04487v4 fatcat:qwr4xcuz45d3hoj3kxjius3q6y

Toward Understanding the Impact of Staleness in Distributed Machine Learning [article]

Wei Dai, Yi Zhou, Nanqing Dong, Hao Zhang, Eric P. Xing
2018 arXiv   pre-print
Our extensive experiments reveal the rich diversity of the effects of staleness on the convergence of ML algorithms and offer insights into seemingly contradictory reports in the literature.  ...  Many distributed machine learning (ML) systems adopt the non-synchronous execution in order to alleviate the network communication bottleneck, resulting in stale parameters that do not reflect the latest  ...  We study the impact of staleness on a diverse set of models: Convolutional Neural Networks (CNNs), Deep Neural Networks (DNNs), multi-class Logistic Regression (MLR), Matrix Factorization (MF), Latent  ... 
arXiv:1810.03264v1 fatcat:drjb6zxycnawpgd7j62r4d7gbu

GaDei: On Scale-up Training As A Service For Deep Learning [article]

Wei Zhang, Minwei Feng, Yunhui Zheng, Yufei Ren, Yandong Wang, Ji Liu, Peng Liu, Bing Xiang, Li Zhang, Bowen Zhou, Fei Wang
2017 arXiv   pre-print
The unique challenge of TaaS is that it must satisfy a wide range of customers who have no experience and resources to tune DL hyper-parameters, and meticulous tuning for each user's dataset is prohibitively  ...  Deep learning (DL) training-as-a-service (TaaS) is an important emerging industrial workload.  ...  GaDei enables efficient multi-learner training for arbitrary type of neural networks (e.g., CNN, RNN). We further verify the correctness of the GaDei's communication protocol.  ... 
arXiv:1611.06213v2 fatcat:esgbeep4hnebhjgykh66mb5zay

Blockchain-Enabled Asynchronous Federated Learning in Edge Computing

Yinghui Liu, Youyang Qu, Chenhao Xu, Zhicheng Hao, Bruce Gu
2021 Sensors  
However, the diversity of computing power and data sizes leads to a significant difference in local training data consumption, and thereby causes the inefficiency of FL.  ...  However, privacy issues during data collection for ML tasks raise extensive concerns.  ...  Acknowledgments: We sincerely thank you Lei Cui and Yufeng Xing for contributing their time and efforts to improve this paper. Conflicts of Interest: The authors declare no conflict of interest.  ... 
doi:10.3390/s21103335 pmid:34064942 fatcat:krb6q5lcnrgwdezk3hgwhh4iai

Data-Aware Device Scheduling for Federated Edge Learning [article]

Afaf Taik, Zoubeir Mlika, Soumaya Cherkaoui
2022 arXiv   pre-print
Therefore, a careful scheduling of a subset of devices for training and uploading models is necessary.  ...  We first define a general framework for the data-aware scheduling and the main axes and requirements for diversity evaluation.  ...  ACKNOWLEDGEMENT The authors would like to thank the Natural Sciences and Engineering Research Council of Canada, for the financial support of this research.  ... 
arXiv:2102.09491v2 fatcat:tooxdxbrazg63iedie4kualsbm

BenchENAS: A Benchmarking Platform for Evolutionary Neural Architecture Search [article]

Xiangning Xie, Yuqiao Liu, Yanan Sun, Gary G. Yen, Bing Xue, Mengjie Zhang
2021 arXiv   pre-print
Unfortunately, the issues of fair comparisons and efficient evaluations have hindered the development of ENAS.  ...  The existing efficient evaluation methods are either not suitable for the population-based ENAS algorithm or are too complex to use.  ...  Every node hosts a copy of the DNN and trains the DNN on a subset of the dataset as shown in Fig. 1 . Then, the values of the parameters are sent to the parameter server.  ... 
arXiv:2108.03856v2 fatcat:w5aa2taebzcsffi5iyhcxbx3qm

FederatedScope: A Flexible Federated Learning Platform for Heterogeneity [article]

Yuexiang Xie, Zhen Wang, Dawei Gao, Daoyuan Chen, Liuyi Yao, Weirui Kuang, Yaliang Li, Bolin Ding, Jingren Zhou
2022 arXiv   pre-print
Towards an easy-to-use and flexible platform, FederatedScope enables rich types of plug-in operations and components for efficient further development, and we have implemented several important components  ...  training strategies.  ...  With FL, a CNN with two convolutional layers is trained for image classification task on this dataset. CIFAR-10.  ... 
arXiv:2204.05011v4 fatcat:t5dba2hz7jbplgmnnoiqhlnjfy

Zero-Shot Temporal Action Detection via Vision-Language Prompting [article]

Sauradip Nag, Xiatian Zhu, Yi-Zhe Song, Tao Xiang
2022 arXiv   pre-print
Collecting and annotating a large training set for each class of interest is costly and hence unscalable.  ...  The PyTorch implementation of STALE is available at https://github.com/sauradip/STALE.  ...  with fewer extra FLOPs and parameters.  ... 
arXiv:2207.08184v1 fatcat:5op7uk6gqzcvzilrcbyczqvqr4

Scalable Deep Learning on Distributed Infrastructures: Challenges, Techniques and Tools [article]

Ruben Mayer, Hans-Arno Jacobsen
2019 arXiv   pre-print
This incorporates infrastructures for DL, methods for parallel DL training, multi-tenant resource scheduling and the management of training and model data.  ...  One of the reasons for this success is the increasing size of DL models and the proliferation of vast amounts of training data being available.  ...  They train two CNNs: one of the CNNs predicts the label while the other CNNs predicts the noise type of the training data set. For training, they first pre-train both CNNs with clean training data.  ... 
arXiv:1903.11314v2 fatcat:y62z7mteyzeq5kenb7srwtlg7q

AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training [article]

Chia-Yu Chen, Jungwook Choi, Daniel Brand, Ankur Agrawal, Wei Zhang, Kailash Gopalakrishnan
2017 arXiv   pre-print
Highly distributed training of Deep Neural Networks (DNNs) on future compute platforms (offering 100 of TeraOps/s of computational capacity) is expected to be severely communication constrained.  ...  (SGD with momentum, Adam) and network parameters (number of learners, minibatch-size etc.).  ...  Acknowledgments The authors would like to thank Naigang Wang, Vijayalakshmi Srinivasan, Swagath Venkataramani, Pritish Narayanan, and I-Hsin Chung for helpful discussions and supports.  ... 
arXiv:1712.02679v1 fatcat:xjfy7efqmrctfa6dnaqvp5qhya

Towards Efficient and Stable K-Asynchronous Federated Learning with Unbounded Stale Gradients on Non-IID Data

Zihao Zhou, Yanan Li, Xuebin Ren, Shusen Yang
2022 IEEE Transactions on Parallel and Distributed Systems  
We also present the convergence analysis for WKAFL under the assumption of unbounded staleness to understand the impact of staleness and non-IID data.  ...  By selecting consistent gradients and adjusting learning rate adaptively, WKAFL utilizes stale gradients and mitigates the impact of non-IID data, which can achieve multifaceted enhancement in training  ...  For EMNIST MNIST, the same CNN model for EMNIST ByClass's is deployed. For CIFAR10, we adopted LeNet [54] , a simple and classical CNN model.  ... 
doi:10.1109/tpds.2022.3150579 fatcat:r7dp3afyl5htdknvoa33pabeky

AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training

Chia-Yu Chen, Jungwook Choi, Daniel Brand, Ankur Agrawal, Wei Zhang, Kailash Gopalakrishnan
2018 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
Highly distributed training of Deep Neural Networks (DNNs) on future compute platforms (offering 100 of TeraOps/s of computational capacity) is expected to be severely communication constrained.  ...  (SGD with momentum, Adam) and network parameters (number of learners, minibatch-size etc.).  ...  Acknowledgments The authors would like to thank Naigang Wang, Vijayalakshmi Srinivasan, Swagath Venkataramani, Pritish Narayanan, and I-Hsin Chung for helpful discussions and supports.  ... 
doi:10.1609/aaai.v32i1.11728 fatcat:4uogjg4vcbazvmvvge4ntdppk4

Asynchronous Federated Optimization [article]

Cong Xie, Sanmi Koyejo, Indranil Gupta
2020 arXiv   pre-print
We prove that the proposed approach has near-linear convergence to a global optimum, for both strongly convex and a restricted family of non-convex problems.  ...  Federated learning enables training on a massive number of edge devices. To improve flexibility and scalability, we propose a new asynchronous federated optimization algorithm.  ...  Ideally, the larger amounts of training data from diverse users results in improved representation and generalization of machine-learning models.  ... 
arXiv:1903.03934v5 fatcat:wzti5vo4dne2pdnara5rgvwsvi
« Previous Showing results 1 — 15 out of 277 results