Filters








929 Hits in 6.7 sec

Asynchronous Stochastic Gradient Descent with Variance Reduction for Non-Convex Optimization [article]

Zhouyuan Huo, Heng Huang
2016 arXiv   pre-print
We provide the first theoretical analysis on the convergence rate of the asynchronous stochastic variance reduced gradient (SVRG) descent algorithm on non-convex optimization.  ...  Recent studies have shown that the asynchronous stochastic gradient descent (SGD) based algorithms with variance reduction converge with a linear convergent rate on convex problems.  ...  Conclusion In this paper, we propose and analyze two different asynchronous stochastic gradient descent with variance reduction for non-convex optimization on two different distributed categories, one  ... 
arXiv:1604.03584v4 fatcat:itgph565arffhhfgarhl74nrne

Asynchronous Doubly Stochastic Group Regularized Learning

Bin Gu, Zhouyuan Huo, Heng Huang
2018 International Conference on Artificial Intelligence and Statistics  
To address this challenging problem, in this paper, we propose a novel asynchronous doubly stochastic proximal gradient algorithm with variance reduction (AsyDSPG+).  ...  The asynchronous parallel stochastic optimization algorithms have received huge attentions recently as handling large scale problems.  ...  Meng et al. [2016] proposed an asynchronous parallel stochastic proximal optimization algorithm with the SVRG variance reduction technique.  ... 
dblp:conf/aistats/GuHH18 fatcat:dzghn2s2wzdgxox7tnmvwyb2ay

Asynchronous Mini-Batch Gradient Descent with Variance Reduction for Non-Convex Optimization

Zhouyuan Huo, Heng Huang
2017 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
We provide the first theoretical analysis on the convergence rate of asynchronous mini-batch gradient descent with variance reduction (AsySVRG) for non-convex optimization.  ...  Asynchronous stochastic gradient descent (AsySGD) has been broadly used for deep learning optimization, and it is proved to converge with rate of O(1/\sqrt{T}) for non-convex optimization.  ...  In this paper, we provide the convergence analysis of asynchronous mini-batch gradient descent with variance reduction method (asySVRG) for non-convex optimization.  ... 
doi:10.1609/aaai.v31i1.10940 fatcat:hfi4mka5gndehgfhicd2rsnyzy

Asynchronous Stochastic Proximal Methods for Nonconvex Nonsmooth Optimization [article]

Rui Zhu, Di Niu, Zongpeng Li
2018 arXiv   pre-print
However, compared to asynchronous parallel stochastic gradient descent (AsynSGD), an algorithm targeting smooth optimization, the understanding of the behavior of stochastic algorithms for nonsmooth regularized  ...  We study stochastic algorithms for solving nonconvex optimization problems with a convex yet possibly nonsmooth regularizer, which find wide applications in many practical machine learning applications  ...  Concluding Remarks In this paper, we study asynchronous parallel implementations of stochastic proximal gradient methods for solving nonconvex optimization problems, with convex yet possibly nonsmooth  ... 
arXiv:1802.08880v3 fatcat:othkev23cjbo7kbnqmm7i2ux4y

A Unified q-Memorization Framework for Asynchronous Stochastic Optimization

Bin Gu, Wenhan Xian, Zhouyuan Huo, Cheng Deng, Heng Huang
2020 Journal of machine learning research  
Specifically, based on the q-memorization framework, 1) we propose an asynchronous stochastic gradient hard thresholding algorithm with q-memorization (AsySGHT-qM) for the non-convex optimization with  ...  proximal gradient algorithm (AsySPG-qM) for the convex optimization with non-smooth regularization, and prove that AsySPG-qM can achieve a linear convergence rate. 3) We propose an asynchronous stochastic  ...  Asynchronous Stochastic Gradient Descent Algorithm with Generalized Variance Reduction In this section, to solve the general non-convex smooth optimization problem (3), we first propose our AsySGD-qM algorithm  ... 
dblp:journals/jmlr/GuXHDH20 fatcat:6mqu7l6jz5gtjkmrux5qihhoxu

Taming Convergence for Asynchronous Stochastic Gradient Descent with Unbounded Delay in Non-Convex Learning [article]

Xin Zhang, Jia Liu, Zhengyuan Zhu
2020 arXiv   pre-print
In this paper, we focus on Async-SGD and its variant Async-SGDI (which uses increasing batch size) for non-convex optimization problems with unbounded gradient delays.  ...  Understanding the convergence performance of asynchronous stochastic gradient descent method (Async-SGD) has received increasing attention in recent years due to their foundational role in machine learning  ...  In [23] , asynchronous stochastic variance reduction (Async-SVR) methods were analyzed for convex objectives and bounded delay.  ... 
arXiv:1805.09470v2 fatcat:uys3uxp56jgrdfbabhvz3xw3yy

Parallel Asynchronous Stochastic Variance Reduction for Nonconvex Optimization

Cong Fang, Zhouchen Lin
2017 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
This further demonstrates that even with asynchronous updating, SVRG has less number of Incremental First-order Oracles (IFOs) compared with Stochastic Gradient Descent and Gradient Descent.  ...  We propose the Asynchronous Stochastic Variance Reduced Gradient (ASVRG) algorithm for nonconvex finite-sum problems.  ...  The standard method to solve Eq. ( 1 ) is through Gradient Descent (GD) and Stochastic Gradient Descent (SGD).  ... 
doi:10.1609/aaai.v31i1.10651 fatcat:o3kfopgbk5guhngxvlg3cbv7jm

Efficient Asynchronous Semi-stochastic Block Coordinate Descent Methods for Large-Scale SVD

Fanhua Shang, Zhihui Zhang, Yuanyuan Liu, Hongying Liua, Jing Xu
2021 IEEE Access  
Moreover, we propose a new Asynchronous parallel Semi-stochastic Block Coordinate Descent algorithm (ASBCD-SVD) and one new Asynchronous parallel Sparse approximated Variance Reduction algorithm (ASVR-SVD  ...  Unlike existing stochastic variance reduction and randomized coordinate descent methods, our algorithm inherits their advantages.  ...  Stochastic Methods: The classic stochastic variance reduction algorithms such as SVRG [22] and semi-stochastic gradient descent (S2GD) [23] have the convergence guarantees for convex optimization problems  ... 
doi:10.1109/access.2021.3094282 fatcat:z6ptyfacpjer5lajuu4sx7doye

Asynchronous Stochastic Frank-Wolfe Algorithms for Non-Convex Optimization

Bin Gu, Wenhan Xian, Heng Huang
2019 Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence  
To address this challenging problem, in this paper, we propose our asynchronous stochastic Frank-Wolfe algorithm (AsySFW) and its variance reduction version (AsySVFW) for solving the constrained non-convex  ...  To the best of our knowledge, AsySFW and AsySVFW are the first asynchronous parallel stochastic algorithms with convergence guarantees for solving the constrained non-convex optimization problems.  ...  For example, Hogwild! [Recht et al., 2011] is a famous asynchronous parallel stochastic gradient descent algorithm for solving smooth finite-sum optimization problems.  ... 
doi:10.24963/ijcai.2019/104 dblp:conf/ijcai/GuXH19 fatcat:y2fccqpyqzhnfc2hqqdelh3fte

Decoupled Asynchronous Proximal Stochastic Gradient Descent with Variance Reduction [article]

Zhouyuan Huo, Bin Gu, Heng Huang
2016 arXiv   pre-print
In this paper, we propose a faster method, decoupled asynchronous proximal stochastic variance reduced gradient descent method (DAP-SVRG).  ...  Asynchronous optimization algorithms come out as a promising solution. Recently, decoupled asynchronous proximal stochastic gradient descent (DAP-SGD) is proposed to minimize a composite function.  ...  In this paper, we propose a decoupled asynchronous proximal stochastic gradient with variance reduction (DAP-SVRG), and we prove that it has linear convergence for strongly convex problem.  ... 
arXiv:1609.06804v2 fatcat:v6woa635mjaunfxk6vlptrqece

Asynchronous Stochastic Proximal Optimization Algorithms with Variance Reduction

Qi Meng, Wei Chen, Jingcheng Yu, Taifeng Wang, Zhi-Ming Ma, Tie-Yan Liu
2017 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
Two classic proximal optimization algorithms, i.e., proximal stochastic gradient descent (ProxSGD) and proximal stochastic coordinate descent (ProxSCD) have been widely used to solve the R-ERM problem.  ...  The results verified our theoretical findings and demonstrated the practical efficiency of the asynchronous stochastic proximal algorithms with variance reduction.  ...  Acknowledgments Zhi-Ming Ma was partially supported by National Center for Mathematics and Interdisciplinary Sciences (NCMIS) of China and NSF of China (11526214).  ... 
doi:10.1609/aaai.v31i1.10910 fatcat:zv4wy66nq5e25aywslngjwljfu

Asynchronous Proximal Stochastic Gradient Algorithm for Composition Optimization Problems

Pengfei Wang, Risheng Liu, Nenggan Zheng, Zhefeng Gong
2019 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
To address these challenges, we propose an asynchronous parallel algorithm, named Async-ProxSCVR, which effectively combines asynchronous parallel implementation and variance reduction method.  ...  To solve this problem, traditional stochastic gradient descent (SGD) algorithm and its variants either have low convergence rate or are computationally expensive.  ...  Also partially supported by the Hunan Provincial Science & Technology Project Foundation (2018TP1018, 2018RS3065) and the Fundamental Research Funds for the Central Universities.  ... 
doi:10.1609/aaai.v33i01.33011633 fatcat:bdljx46xhzf6fchpwdq5odz33i

Advances in Asynchronous Parallel and Distributed Optimization [article]

Mahmoud Assran, Arda Aytekin, Hamid Feyzmahdavian, Mikael Johansson, Michael Rabbat
2020 arXiv   pre-print
The analysis provides insights as to how the degree of asynchrony impacts convergence rates, especially in stochastic optimization methods.  ...  Asynchronous methods do not require all processors to maintain a consistent view of the optimization variables.  ...  Proximal methods for convex and non-convex optimization For ease of exposition, we have described stochastic gradient methods for smooth and strongly convex losses.  ... 
arXiv:2006.13838v1 fatcat:62rqij6anfh7nodujw7dz2s6lq

Stochastic Momentum Method with Double Acceleration for Regularized Empirical Risk Minimization

Zhijian Luo, Siyu Chen, Yuntao Qian
2019 IEEE Access  
Momentum acceleration technique is famously known for building gradient-based algorithms with fast convergence in large-scale optimization.  ...  In this paper, we build a stochastic and doubly accelerated momentum method (SDAMM) which incorporates the Nesterov's momentum and Katyusha momentum in the framework of variance reduction, to stabilize  ...  STOCHASTIC PROXIMAL GRADIENT DESCENT AND VARIANCE REDUCTION A popular method is the randomized version of the proximal gradient descent (PGD) method, a.k.a stochastic proximal gradient descent (SPGD) method  ... 
doi:10.1109/access.2019.2953288 fatcat:nqv3vpnna5ctdfyu26rpaq7ium

Asynchronous Stochastic Proximal Optimization Algorithms with Variance Reduction [article]

Qi Meng, Wei Chen, Jingcheng Yu, Taifeng Wang, Zhi-Ming Ma, Tie-Yan Liu
2016 arXiv   pre-print
Two classic proximal optimization algorithms, i.e., proximal stochastic gradient descent (ProxSGD) and proximal stochastic coordinate descent (ProxSCD) have been widely used to solve the R-ERM problem.  ...  The results verified our theoretical findings and demonstrated the practical efficiency of the asynchronous stochastic proximal algorithms with variance reduction.  ...  With variance reduction technique, the optimization process is divided into multiple stages (i.e., outer loop: s = 1, · · · , S).  ... 
arXiv:1609.08435v1 fatcat:bq2jy5bhxjgcrkf5fzrm3f2a24
« Previous Showing results 1 — 15 out of 929 results