400 Hits in 5.8 sec

Asynchronous Mini-Batch Gradient Descent with Variance Reduction for Non-Convex Optimization

Zhouyuan Huo, Heng Huang
We provide the first theoretical analysis on the convergence rate of asynchronous mini-batch gradient descent with variance reduction (AsySVRG) for non-convex optimization.  ...  Asynchronous stochastic gradient descent (AsySGD) has been broadly used for deep learning optimization, and it is proved to converge with rate of O(1/\sqrt{T}) for non-convex optimization.  ...  In this paper, we provide the convergence analysis of asynchronous mini-batch gradient descent with variance reduction method (asySVRG) for non-convex optimization.  ... 
doi:10.1609/aaai.v31i1.10940 fatcat:hfi4mka5gndehgfhicd2rsnyzy

Asynchronous Doubly Stochastic Group Regularized Learning

Bin Gu, Zhouyuan Huo, Heng Huang
2018 International Conference on Artificial Intelligence and Statistics  
To address this challenging problem, in this paper, we propose a novel asynchronous doubly stochastic proximal gradient algorithm with variance reduction (AsyDSPG+).  ...  The asynchronous parallel stochastic optimization algorithms have received huge attentions recently as handling large scale problems.  ...  ., 2016] with the mini-batch size 100. 4. AsyDSPG+: Our AsyDSPG+ with the mini-batch size 1. 5. AsyDSPG+ mb = 100: Our AsyDSPG+ with the mini-batch size 100.  ... 
dblp:conf/aistats/GuHH18 fatcat:dzghn2s2wzdgxox7tnmvwyb2ay

Asynchronous Stochastic Proximal Methods for Nonconvex Nonsmooth Optimization [article]

Rui Zhu, Di Niu, Zongpeng Li
2018 arXiv   pre-print
However, compared to asynchronous parallel stochastic gradient descent (AsynSGD), an algorithm targeting smooth optimization, the understanding of the behavior of stochastic algorithms for nonsmooth regularized  ...  We study stochastic algorithms for solving nonconvex optimization problems with a convex yet possibly nonsmooth regularizer, which find wide applications in many practical machine learning applications  ...  We ll the gap in the literature by providing convergence rates for ProxSGD under constant batch sizes without variance reduction.  ... 
arXiv:1802.08880v3 fatcat:othkev23cjbo7kbnqmm7i2ux4y

Taming Convergence for Asynchronous Stochastic Gradient Descent with Unbounded Delay in Non-Convex Learning [article]

Xin Zhang, Jia Liu, Zhengyuan Zhu
2020 arXiv   pre-print
In this paper, we focus on Async-SGD and its variant Async-SGDI (which uses increasing batch size) for non-convex optimization problems with unbounded gradient delays.  ...  Understanding the convergence performance of asynchronous stochastic gradient descent method (Async-SGD) has received increasing attention in recent years due to their foundational role in machine learning  ...  Most recently in [13] , an asynchronous mini-batch SVRG with bounded delay is proposed for solving non-convex optimization problems.  ... 
arXiv:1805.09470v2 fatcat:uys3uxp56jgrdfbabhvz3xw3yy

Asynchronous Stochastic Gradient Descent with Variance Reduction for Non-Convex Optimization [article]

Zhouyuan Huo, Heng Huang
2016 arXiv   pre-print
We provide the first theoretical analysis on the convergence rate of the asynchronous stochastic variance reduced gradient (SVRG) descent algorithm on non-convex optimization.  ...  However, there is no work to analyze asynchronous SGD with variance reduction technique on non-convex problem.  ...  Conclusion In this paper, we propose and analyze two different asynchronous stochastic gradient descent with variance reduction for non-convex optimization on two different distributed categories, one  ... 
arXiv:1604.03584v4 fatcat:itgph565arffhhfgarhl74nrne

Asynchronous Stochastic Frank-Wolfe Algorithms for Non-Convex Optimization

Bin Gu, Wenhan Xian, Heng Huang
2019 Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence  
To address this challenging problem, in this paper, we propose our asynchronous stochastic Frank-Wolfe algorithm (AsySFW) and its variance reduction version (AsySVFW) for solving the constrained non-convex  ...  To the best of our knowledge, AsySFW and AsySVFW are the first asynchronous parallel stochastic algorithms with convergence guarantees for solving the constrained non-convex optimization problems.  ...  For example, Hogwild! [Recht et al., 2011] is a famous asynchronous parallel stochastic gradient descent algorithm for solving smooth finite-sum optimization problems.  ... 
doi:10.24963/ijcai.2019/104 dblp:conf/ijcai/GuXH19 fatcat:y2fccqpyqzhnfc2hqqdelh3fte

Stochastic Momentum Method with Double Acceleration for Regularized Empirical Risk Minimization

Zhijian Luo, Siyu Chen, Yuntao Qian
2019 IEEE Access  
Momentum acceleration technique is famously known for building gradient-based algorithms with fast convergence in large-scale optimization.  ...  However, the practical gain of acceleration with Nesterov's momentum is mainly a by-product of mini-batching, while acceleration merely with Katyusha momentum in stochastic steps would make the optimization  ...  The importance sampling in first-order stochastic accelerated optimization with variance reduction has been analyzed in the mini-batch setting [23] .  ... 
doi:10.1109/access.2019.2953288 fatcat:nqv3vpnna5ctdfyu26rpaq7ium

Parallel Asynchronous Stochastic Variance Reduction for Nonconvex Optimization

Cong Fang, Zhouchen Lin
This further demonstrates that even with asynchronous updating, SVRG has less number of Incremental First-order Oracles (IFOs) compared with Stochastic Gradient Descent and Gradient Descent.  ...  We propose the Asynchronous Stochastic Variance Reduced Gradient (ASVRG) algorithm for nonconvex finite-sum problems.  ...  Variance Reduction (VR) methods are one of the great varieties of SGD methods which ensure the descent direction to have a bounded variance and so can achieve a much faster convergence rate compared with  ... 
doi:10.1609/aaai.v31i1.10651 fatcat:o3kfopgbk5guhngxvlg3cbv7jm

Asynchronous Proximal Stochastic Gradient Algorithm for Composition Optimization Problems

Pengfei Wang, Risheng Liu, Nenggan Zheng, Zhefeng Gong
To address these challenges, we propose an asynchronous parallel algorithm, named Async-ProxSCVR, which effectively combines asynchronous parallel implementation and variance reduction method.  ...  We prove that the algorithm admits the fastest convergence rate for both strongly convex and general nonconvex cases.  ...  Also partially supported by the Hunan Provincial Science & Technology Project Foundation (2018TP1018, 2018RS3065) and the Fundamental Research Funds for the Central Universities.  ... 
doi:10.1609/aaai.v33i01.33011633 fatcat:bdljx46xhzf6fchpwdq5odz33i

Advances in Asynchronous Parallel and Distributed Optimization [article]

Mahmoud Assran, Arda Aytekin, Hamid Feyzmahdavian, Mikael Johansson, Michael Rabbat
2020 arXiv   pre-print
Asynchronous methods do not require all processors to maintain a consistent view of the optimization variables.  ...  Motivated by large-scale optimization problems arising in the context of machine learning, there have been several advances in the study of asynchronous parallel and distributed optimization methods during  ...  Convergence rate of asynchronous mini-batch algorithms and randomized coordinate descent methods for non-convex optimization are studied in [55] . Extensions of Hogwild!  ... 
arXiv:2006.13838v1 fatcat:62rqij6anfh7nodujw7dz2s6lq

Faster Derivative-Free Stochastic Algorithm for Shared Memory Machines

Bin Gu, Zhouyuan Huo, Cheng Deng, Heng Huang
2018 International Conference on Machine Learning  
However, its convergence rate is O( 1 √ T ) for the smooth, possibly non-convex learning problems, which is significantly slower than O( 1 T ) the best convergence rate of (asynchronous) stochastic gradient  ...  We provide a faster convergence rate O( 1 bT ) (b is the mini-batch size) for AsySZO+ by the rigorous theoretical analysis, which is a significant improvement over O( 1 √ T ).  ...  Algorithm 2 New Asynchronous Stochastic Zeroth-Order Algorithm with Variance Reduction and Mini-Batch (AsySZO+) Input: m, S, γ, {µ j } j=1...,N , and Y .  ... 
dblp:conf/icml/GuHDH18 fatcat:dbznezvacrd2jipt7dwjyg2dtq

Dynamic Federated Learning [article]

Elsa Rizk, Stefan Vlaski, Ali H. Sayed
2020 arXiv   pre-print
Under a non-stationary random walk model on the true minimizer for the aggregate optimization problem, we establish that the performance of the architecture is determined by three factors, namely, the  ...  Federated learning has emerged as an umbrella term for centralized coordination strategies in multi-agent environments.  ...  Agents with high data variability will incur a higher variance by employing a mini-batch approximation, instead of a full gradient update, but can mitigate this effect by increasing the mini-batch size  ... 
arXiv:2002.08782v2 fatcat:l5cvo2kynnf55lbbetrjsc7bxq

Local SGD Converges Fast and Communicates Little [article]

Sebastian U. Stich
2019 arXiv   pre-print
Mini-batch stochastic gradient descent (SGD) is state of the art in large scale distributed training.  ...  We prove concise convergence rates for local SGD on convex problems and show that it converges at the same rate as mini-batch SGD in terms of number of evaluated gradients, that is, the scheme achieves  ...  Acknowledgments The author thanks Jean-Baptiste Cordonnier, Tao Lin and Kumar Kshitij Patel for spotting various typos in the first versions of this manuscript, as well as Martin Jaggi for his support.  ... 
arXiv:1805.09767v3 fatcat:cxzel3swg5calhd37lnk4b7vuu

Slow and Stale Gradients Can Win the Race: Error-Runtime Trade-offs in Distributed SGD [article]

Sanghamitra Dutta, Gauri Joshi, Soumyadip Ghosh, Parijat Dube, Priya Nagpurkar
2018 arXiv   pre-print
Distributed Stochastic Gradient Descent (SGD) when run in a synchronous manner, suffers from delays in waiting for the slowest learners (stragglers).  ...  We also present a new convergence analysis of asynchronous SGD variants without bounded or exponential delay assumptions, and a novel learning rate schedule to compensate for gradient staleness.  ...  Acknowledgements The authors thank Mark Wegman, Pulkit Grover and Jianyu Wang for their suggestions and feedback.  ... 
arXiv:1803.01113v3 fatcat:tehvwbmi6bffhi2hfvv7zvqe5e

Asynchronous Stochastic Proximal Optimization Algorithms with Variance Reduction [article]

Qi Meng, Wei Chen, Jingcheng Yu, Taifeng Wang, Zhi-Ming Ma, Tie-Yan Liu
2016 arXiv   pre-print
Two classic proximal optimization algorithms, i.e., proximal stochastic gradient descent (ProxSGD) and proximal stochastic coordinate descent (ProxSCD) have been widely used to solve the R-ERM problem.  ...  The results verified our theoretical findings and demonstrated the practical efficiency of the asynchronous stochastic proximal algorithms with variance reduction.  ...  With variance reduction technique, the optimization process is divided into multiple stages (i.e., outer loop: s = 1, · · · , S).  ... 
arXiv:1609.08435v1 fatcat:bq2jy5bhxjgcrkf5fzrm3f2a24
« Previous Showing results 1 — 15 out of 400 results