A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Asynchronous Stochastic Gradient Descent with Variance Reduction for Non-Convex Optimization
[article]
2016
arXiv
pre-print
We provide the first theoretical analysis on the convergence rate of the asynchronous stochastic variance reduced gradient (SVRG) descent algorithm on non-convex optimization. ...
Recent studies have shown that the asynchronous stochastic gradient descent (SGD) based algorithms with variance reduction converge with a linear convergent rate on convex problems. ...
Conclusion In this paper, we propose and analyze two different asynchronous stochastic gradient descent with variance reduction for non-convex optimization on two different distributed categories, one ...
arXiv:1604.03584v4
fatcat:itgph565arffhhfgarhl74nrne
Asynchronous Doubly Stochastic Group Regularized Learning
2018
International Conference on Artificial Intelligence and Statistics
To address this challenging problem, in this paper, we propose a novel asynchronous doubly stochastic proximal gradient algorithm with variance reduction (AsyDSPG+). ...
The asynchronous parallel stochastic optimization algorithms have received huge attentions recently as handling large scale problems. ...
Meng et al. [2016] proposed an asynchronous parallel stochastic proximal optimization algorithm with the SVRG variance reduction technique. ...
dblp:conf/aistats/GuHH18
fatcat:dzghn2s2wzdgxox7tnmvwyb2ay
Asynchronous Mini-Batch Gradient Descent with Variance Reduction for Non-Convex Optimization
2017
PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE
We provide the first theoretical analysis on the convergence rate of asynchronous mini-batch gradient descent with variance reduction (AsySVRG) for non-convex optimization. ...
Asynchronous stochastic gradient descent (AsySGD) has been broadly used for deep learning optimization, and it is proved to converge with rate of O(1/\sqrt{T}) for non-convex optimization. ...
In this paper, we provide the convergence analysis of asynchronous mini-batch gradient descent with variance reduction method (asySVRG) for non-convex optimization. ...
doi:10.1609/aaai.v31i1.10940
fatcat:hfi4mka5gndehgfhicd2rsnyzy
Asynchronous Stochastic Proximal Methods for Nonconvex Nonsmooth Optimization
[article]
2018
arXiv
pre-print
However, compared to asynchronous parallel stochastic gradient descent (AsynSGD), an algorithm targeting smooth optimization, the understanding of the behavior of stochastic algorithms for nonsmooth regularized ...
We study stochastic algorithms for solving nonconvex optimization problems with a convex yet possibly nonsmooth regularizer, which find wide applications in many practical machine learning applications ...
Concluding Remarks In this paper, we study asynchronous parallel implementations of stochastic proximal gradient methods for solving nonconvex optimization problems, with convex yet possibly nonsmooth ...
arXiv:1802.08880v3
fatcat:othkev23cjbo7kbnqmm7i2ux4y
A Unified q-Memorization Framework for Asynchronous Stochastic Optimization
2020
Journal of machine learning research
Specifically, based on the q-memorization framework, 1) we propose an asynchronous stochastic gradient hard thresholding algorithm with q-memorization (AsySGHT-qM) for the non-convex optimization with ...
proximal gradient algorithm (AsySPG-qM) for the convex optimization with non-smooth regularization, and prove that AsySPG-qM can achieve a linear convergence rate. 3) We propose an asynchronous stochastic ...
Asynchronous Stochastic Gradient Descent Algorithm with Generalized Variance Reduction In this section, to solve the general non-convex smooth optimization problem (3), we first propose our AsySGD-qM algorithm ...
dblp:journals/jmlr/GuXHDH20
fatcat:6mqu7l6jz5gtjkmrux5qihhoxu
Taming Convergence for Asynchronous Stochastic Gradient Descent with Unbounded Delay in Non-Convex Learning
[article]
2020
arXiv
pre-print
In this paper, we focus on Async-SGD and its variant Async-SGDI (which uses increasing batch size) for non-convex optimization problems with unbounded gradient delays. ...
Understanding the convergence performance of asynchronous stochastic gradient descent method (Async-SGD) has received increasing attention in recent years due to their foundational role in machine learning ...
In [23] , asynchronous stochastic variance reduction (Async-SVR) methods were analyzed for convex objectives and bounded delay. ...
arXiv:1805.09470v2
fatcat:uys3uxp56jgrdfbabhvz3xw3yy
Parallel Asynchronous Stochastic Variance Reduction for Nonconvex Optimization
2017
PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE
This further demonstrates that even with asynchronous updating, SVRG has less number of Incremental First-order Oracles (IFOs) compared with Stochastic Gradient Descent and Gradient Descent. ...
We propose the Asynchronous Stochastic Variance Reduced Gradient (ASVRG) algorithm for nonconvex finite-sum problems. ...
The standard method to solve Eq. ( 1 ) is through Gradient Descent (GD) and Stochastic Gradient Descent (SGD). ...
doi:10.1609/aaai.v31i1.10651
fatcat:o3kfopgbk5guhngxvlg3cbv7jm
Efficient Asynchronous Semi-stochastic Block Coordinate Descent Methods for Large-Scale SVD
2021
IEEE Access
Moreover, we propose a new Asynchronous parallel Semi-stochastic Block Coordinate Descent algorithm (ASBCD-SVD) and one new Asynchronous parallel Sparse approximated Variance Reduction algorithm (ASVR-SVD ...
Unlike existing stochastic variance reduction and randomized coordinate descent methods, our algorithm inherits their advantages. ...
Stochastic Methods: The classic stochastic variance reduction algorithms such as SVRG [22] and semi-stochastic gradient descent (S2GD) [23] have the convergence guarantees for convex optimization problems ...
doi:10.1109/access.2021.3094282
fatcat:z6ptyfacpjer5lajuu4sx7doye
Asynchronous Stochastic Frank-Wolfe Algorithms for Non-Convex Optimization
2019
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence
To address this challenging problem, in this paper, we propose our asynchronous stochastic Frank-Wolfe algorithm (AsySFW) and its variance reduction version (AsySVFW) for solving the constrained non-convex ...
To the best of our knowledge, AsySFW and AsySVFW are the first asynchronous parallel stochastic algorithms with convergence guarantees for solving the constrained non-convex optimization problems. ...
For example, Hogwild! [Recht et al., 2011] is a famous asynchronous parallel stochastic gradient descent algorithm for solving smooth finite-sum optimization problems. ...
doi:10.24963/ijcai.2019/104
dblp:conf/ijcai/GuXH19
fatcat:y2fccqpyqzhnfc2hqqdelh3fte
Decoupled Asynchronous Proximal Stochastic Gradient Descent with Variance Reduction
[article]
2016
arXiv
pre-print
In this paper, we propose a faster method, decoupled asynchronous proximal stochastic variance reduced gradient descent method (DAP-SVRG). ...
Asynchronous optimization algorithms come out as a promising solution. Recently, decoupled asynchronous proximal stochastic gradient descent (DAP-SGD) is proposed to minimize a composite function. ...
In this paper, we propose a decoupled asynchronous proximal stochastic gradient with variance reduction (DAP-SVRG), and we prove that it has linear convergence for strongly convex problem. ...
arXiv:1609.06804v2
fatcat:v6woa635mjaunfxk6vlptrqece
Asynchronous Stochastic Proximal Optimization Algorithms with Variance Reduction
2017
PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE
Two classic proximal optimization algorithms, i.e., proximal stochastic gradient descent (ProxSGD) and proximal stochastic coordinate descent (ProxSCD) have been widely used to solve the R-ERM problem. ...
The results verified our theoretical findings and demonstrated the practical efficiency of the asynchronous stochastic proximal algorithms with variance reduction. ...
Acknowledgments Zhi-Ming Ma was partially supported by National Center for Mathematics and Interdisciplinary Sciences (NCMIS) of China and NSF of China (11526214). ...
doi:10.1609/aaai.v31i1.10910
fatcat:zv4wy66nq5e25aywslngjwljfu
Asynchronous Proximal Stochastic Gradient Algorithm for Composition Optimization Problems
2019
PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE
To address these challenges, we propose an asynchronous parallel algorithm, named Async-ProxSCVR, which effectively combines asynchronous parallel implementation and variance reduction method. ...
To solve this problem, traditional stochastic gradient descent (SGD) algorithm and its variants either have low convergence rate or are computationally expensive. ...
Also partially supported by the Hunan Provincial Science & Technology Project Foundation (2018TP1018, 2018RS3065) and the Fundamental Research Funds for the Central Universities. ...
doi:10.1609/aaai.v33i01.33011633
fatcat:bdljx46xhzf6fchpwdq5odz33i
Advances in Asynchronous Parallel and Distributed Optimization
[article]
2020
arXiv
pre-print
The analysis provides insights as to how the degree of asynchrony impacts convergence rates, especially in stochastic optimization methods. ...
Asynchronous methods do not require all processors to maintain a consistent view of the optimization variables. ...
Proximal methods for convex and non-convex optimization For ease of exposition, we have described stochastic gradient methods for smooth and strongly convex losses. ...
arXiv:2006.13838v1
fatcat:62rqij6anfh7nodujw7dz2s6lq
Stochastic Momentum Method with Double Acceleration for Regularized Empirical Risk Minimization
2019
IEEE Access
Momentum acceleration technique is famously known for building gradient-based algorithms with fast convergence in large-scale optimization. ...
In this paper, we build a stochastic and doubly accelerated momentum method (SDAMM) which incorporates the Nesterov's momentum and Katyusha momentum in the framework of variance reduction, to stabilize ...
STOCHASTIC PROXIMAL GRADIENT DESCENT AND VARIANCE REDUCTION A popular method is the randomized version of the proximal gradient descent (PGD) method, a.k.a stochastic proximal gradient descent (SPGD) method ...
doi:10.1109/access.2019.2953288
fatcat:nqv3vpnna5ctdfyu26rpaq7ium
Asynchronous Stochastic Proximal Optimization Algorithms with Variance Reduction
[article]
2016
arXiv
pre-print
Two classic proximal optimization algorithms, i.e., proximal stochastic gradient descent (ProxSGD) and proximal stochastic coordinate descent (ProxSCD) have been widely used to solve the R-ERM problem. ...
The results verified our theoretical findings and demonstrated the practical efficiency of the asynchronous stochastic proximal algorithms with variance reduction. ...
With variance reduction technique, the optimization process is divided into multiple stages (i.e., outer loop: s = 1, · · · , S). ...
arXiv:1609.08435v1
fatcat:bq2jy5bhxjgcrkf5fzrm3f2a24
« Previous
Showing results 1 — 15 out of 929 results