Filters








16 Hits in 4.7 sec

Qsparse-local-SGD: Distributed SGD with Quantization, Sparsification, and Local Computations [article]

Debraj Basu, Deepesh Data, Can Karakus, Suhas Diggavi
2019 arXiv   pre-print
In this paper, we propose Qsparse-local-SGD algorithm, which combines aggressive sparsification with quantization and local computation along with error compensation, by keeping track of the difference  ...  We demonstrate that Qsparse-local-SGD converges at the same rate as vanilla distributed SGD for many important classes of sparsifiers and quantizers.  ...  Acknowledgement The authors gratefully thank Navjot Singh for his help with the experiments in the early stages of this work.  ... 
arXiv:1906.02367v2 fatcat:slaqwhyaa5gqha4n5phtgct54e

Qsparse-local-SGD: Distributed SGD with Quantization, Sparsification, and Local Computations

Debraj Basu, Deepesh Data, Can Karakus, Suhas Diggavi
2020 IEEE Journal on Selected Areas in Information Theory  
In this paper, we propose Qsparse-local-SGD algorithm, which combines aggressive sparsification with quantization and local computation along with error compensation, by keeping track of the difference  ...  We demonstrate that Qsparse-local-SGD converges at the same rate as vanilla distributed SGD for many important classes of sparsifiers and quantizers.  ...  We develop Qsparse-local-SGD, a distributed SGD composing gradient quantization and explicit sparsification (e.g., Top k components), along with local iterations.  ... 
doi:10.1109/jsait.2020.2985917 fatcat:opvx6bss3vaujcfulpycwc5kpm

CSER: Communication-efficient SGD with Error Reset [article]

Cong Xie, Shuai Zheng, Oluwasanmi Koyejo, Indranil Gupta, Mu Li, Haibin Lin
2020 arXiv   pre-print
The scalability of Distributed Stochastic Gradient Descent (SGD) is today limited by communication bottlenecks. We propose a novel SGD variant: Communication-efficient SGD with Error Reset, or CSER.  ...  The key idea in CSER is first a new technique called "error reset" that adapts arbitrary compressors for SGD, producing bifurcated local models with periodic reset of resulting local residual errors.  ...  Acknowledgments and Disclosure of Funding This work was funded in part by the following grants: NSF IIS 1909577, NSF CNS 1908888, NSF CCF 1934986 and a JP Morgan Chase Fellowship, along with computational  ... 
arXiv:2007.13221v3 fatcat:33pwl63teze7pbvm3uqppbb6nu

2020 Index IEEE Journal on Selected Areas in Information Theory Vol. 1

2020 IEEE Journal on Selected Areas in Information Theory  
The primary entry includes the coauthors' names, the title of the paper or other item, and its location, specified by the publication abbreviation, year, month, and inclusive pagination.  ...  The Subject Index contains entries describing the item under all appropriate subject headings, plus the first author's name, the publication abbreviation, month, and year, and inclusive pages.  ...  ., +, JSAIT May 2020 250-266 Qsparse-Local-SGD: Distributed SGD With Quantization, Sparsification, and Local Computations.  ... 
doi:10.1109/jsait.2021.3054976 fatcat:6web2tcguzehlnpqnedkya4gzq

GRACE: A Compressed Communication Framework for Distributed Machine Learning

Hang Xu, Chen-Yu Ho, Ahmed M. Abdelmoniem, Aritra Dutta, El Houcine Bergou, Konstantinos Karatsenidis, Marco Canini, Panos Kalnis
2021 2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS)  
In this paper, we present a comprehensive survey of the most influential compressed communication methods for DNN training, together with an intuitive classification (i.e., quantization, sparsification  ...  Powerful computer clusters are used nowadays to train complex deep neural networks (DNN) on large datasets. Distributed training increasingly becomes communication bound.  ...  We propose GRACE, a unified framework with the corresponding Ten-sorFlow and PyTorch API, and implement 16 representative compression methods.  ... 
doi:10.1109/icdcs51616.2021.00060 fatcat:zui5zowikvgchnfqrsppn4u22y

CFedAvg: Achieving Efficient Communication and Fast Convergence in Non-IID Federated Learning [article]

Haibo Yang, Jia Liu, Elizabeth S. Bentley
2021 arXiv   pre-print
number of local steps, T is the number of total communication rounds, and m is the total worker number.  ...  We analyze the convergence rate of CFedAvg for non-convex functions with constant and decaying learning rates.  ...  The most related work to this paper is Qsparse-local-SGD [3] , which combines unbiased quantization, sparsification and local steps together and is able to recover or generalize other compression methods  ... 
arXiv:2106.07155v1 fatcat:akvol7yqrbedraq3ferqrrzwqm

Toward Efficient Federated Learning in Multi-Channeled Mobile Edge Network with Layerd Gradient Compression [article]

Haizhou Du, Xiaojie Feng, Qiao Xiang, Haoyu Liu
2021 arXiv   pre-print
We then propose a learning based algorithm for each device to dynamically adjust its local computation (i.e., the number of local stochastic descent) and communication decisions (i.e.  ...  We prove the convergence of LGC, and formally define the problem of resource-efficient federated learning with LGC.  ...  Basu, D.; Data, D.; Karakus, C.; and Diggavi, S. 2019. Qsparse-local-SGD: Distributed SGD with Quantization, Convergence curves of different mechanisms with RNN on Shakespeare. H. 2017.  ... 
arXiv:2109.08819v1 fatcat:trcwuca34rdcdltkxprmx6wfca

SPARQ-SGD: Event-Triggered and Compressed Communication in Decentralized Stochastic Optimization [article]

Navjot Singh, Deepesh Data, Jemin George, Suhas Diggavi
2020 arXiv   pre-print
Each node can locally compute a condition (event) which triggers a communication where quantized and sparsified local model parameters are sent.  ...  , model sparsification and quantization does not affect the overall convergence rate as compared to uncompressed decentralized training; thereby theoretically yielding communication efficiency for "free  ...  Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on.  ... 
arXiv:1910.14280v2 fatcat:oxyl532m7fdgblydpk2fegbhee

Coded Stochastic ADMM for Decentralized Consensus Optimization with Edge Computing [article]

Hao Chen, Yu Ye, Ming Xiao, Mikael Skoglund, H. Vincent Poor
2020 arXiv   pre-print
We consider the problem of learning model parameters in a multi-agent system with data locally processed via distributed edge nodes.  ...  To address two main critical challenges in distributed networks, i.e., communication bottleneck and straggler nodes (nodes with slow responses), error-control-coding based stochastic incremental ADMM is  ...  In [20] , the Qsparse-local-SGD algorithm was proposed, which combines aggressive sparcification with quantization and local computation along with error compensation.  ... 
arXiv:2010.00914v1 fatcat:o7oy4w4hznehtok35l546kard4

MARINA: Faster Non-Convex Distributed Learning with Compression [article]

Eduard Gorbunov, Konstantin Burlachenko, Zhize Li, Peter Richtárik
2022 arXiv   pre-print
We develop and analyze MARINA: a new communication efficient method for non-convex distributed learning over heterogeneous datasets.  ...  The first method is designed for the case when the local loss functions owned by clients are either of a finite sum or of an expectation form, and the second method allows for a partial participation of  ...  Gorbunov in Sections 1, 2, and C was also partially supported by the Ministry of Science and Higher Education of the Russian Federation (Goszadaniye) 075-00337-20-03, project No. 0714-2020-0005, and in  ... 
arXiv:2102.07845v3 fatcat:iqc6prvzhjfujgj4ev53ctivey

Communication-Efficient Distributed Linear and Deep Generalized Canonical Correlation Analysis [article]

Sagar Shrestha, Xiao Fu
2021 arXiv   pre-print
The overhead issue is addressed by aggressively compressing (via quantization) the exchanging information between the distributed computing agents and a central controller.  ...  When the views are acquired and stored at different locations, organizations and edge devices, computing GCCA in a distributed, parallel and efficient manner is well-motivated.  ...  Typical compressors in the literature include SignSGD [15] , QSGD [16] , and Qsparse-local-SGD [18] .  ... 
arXiv:2109.12400v1 fatcat:az473vazlvfxloy2jf247757ui

SQuARM-SGD: Communication-Efficient Momentum SGD for Decentralized Optimization [article]

Navjot Singh, Deepesh Data, Jemin George, Suhas Diggavi
2020 arXiv   pre-print
In SQuARM-SGD, each node performs a fixed number of local SGD (stochastic gradient descent) steps using Nesterov's momentum and then sends sparisified and quantized updates to its neighbors only when there  ...  We propose and analyze SQuARM-SGD, a decentralized training algorithm, employing momentum and compressed communication between nodes regulated by a locally computable triggering rule.  ...  [BDKD19] Debraj Basu, Deepesh Data, Can Karakus, and Suhas N. Diggavi. Qsparse-local-sgd: Dis- tributed SGD with quantization, sparsification and local computations.  ... 
arXiv:2005.07041v2 fatcat:wn7e64caf5bgxjnzs7vrws4xcq

DeepReduce: A Sparse-tensor Communication Framework for Distributed Deep Learning [article]

Kelly Kostopoulou, Hang Xu, Aritra Dutta, Xin Li, Alexandros Ntoulas, Panos Kalnis
2021 arXiv   pre-print
Our experiments with large real models demonstrate that DeepReduce transmits fewer data and imposes lower computational overhead than existing methods, without affecting the training accuracy.  ...  Sparse tensors appear frequently in distributed deep learning, either as a direct artifact of the deep neural network's gradients, or as a result of an explicit sparsification process.  ...  The computing infrastructure was provided by the KAUST Super-computing Lab (KSL).  ... 
arXiv:2102.03112v1 fatcat:3c6m6t5vpjbqvncqaxpdbi35qy

Convergence and Accuracy Trade-Offs in Federated Learning and Meta-Learning [article]

Zachary Charles, Jakub Konečný
2021 arXiv   pre-print
We study a family of algorithms, which we refer to as local update methods, generalizing many federated and meta-learning algorithms.  ...  Moreover, fundamental algorithmic choices (such as learning rates) explicitly govern a trade-off between the condition number of the surrogate loss and its alignment with the true loss.  ...  Debraj Basu, Deepesh Data, Can Karakus, and Suhas Diggavi. Qsparse-local-SGD: Distributed SGD with quantization, sparsification and local computations.  ... 
arXiv:2103.05032v1 fatcat:wrduglafx5dhzhtugt5oiyzihu

Distributed Learning with Sparse Communications by Identification [article]

Dmitry Grishchenko, Franck Iutzeler, Jérôme Malick, Massih-Reza Amini
2020 arXiv   pre-print
This reduction comes from a random sparsification of the local updates.  ...  When computations are performed by workers on local data while a coordinator machine coordinates their updates to minimize a global loss, we present an asynchronous optimization algorithm that efficiently  ...  Qsparse-local-sgd: Distributed sgd with quantization, sparsi cation and local computations, in Advances in Neural Information Processing Systems, , pp. -. [ ] H. H. B P. L.  ... 
arXiv:1812.03871v2 fatcat:u4ddccfhdjcibnuuqqxpvx44va
« Previous Showing results 1 — 15 out of 16 results