Filters








3,651 Hits in 3.5 sec

Stochastic Variance Reduction for Nonconvex Optimization [article]

Sashank J. Reddi, Ahmed Hefny, Suvrit Sra, Barnabas Poczos, Alex Smola
2016 arXiv   pre-print
We study nonconvex finite-sum problems and analyze stochastic variance reduced gradient (SVRG) methods for them.  ...  SVRG and related methods have recently surged into prominence for convex optimization given their edge over stochastic gradient descent (SGD); but their theoretical analysis almost exclusively assumes  ...  Lemma 4 (in Section G of the appendix) shows the reduction in the variance of stochastic gradients with mini-batch size b.  ... 
arXiv:1603.06160v2 fatcat:doeqpw2gcfh3tn4iuoyj3jmwbq

Stochastic Nested Variance Reduction for Nonconvex Optimization [article]

Dongruo Zhou and Pan Xu and Quanquan Gu
2020 arXiv   pre-print
We propose a new stochastic gradient descent algorithm based on nested variance reduction.  ...  We study finite-sum nonconvex optimization problems, where the objective function is an average of n nonconvex functions.  ...  We also thank AWS for providing cloud computing credits associated with the NSF BIGDATA award.  ... 
arXiv:1806.07811v2 fatcat:lxjstxmqk5cpvln5wh5bkzkdry

Stochastic Alternating Direction Method of Multipliers with Variance Reduction for Nonconvex Optimization [article]

Feihu Huang, Songcan Chen, Zhaosong Lu
2017 arXiv   pre-print
In the paper, we study the stochastic alternating direction method of multipliers (ADMM) for the nonconvex optimizations, and propose three classes of the nonconvex stochastic ADMM with variance reduction  ...  In particular, we provide a general framework to analyze the iteration complexity of these nonconvex stochastic ADMM methods with variance reduction.  ...  Moreover, we propose three classes of nonconvex stochastic ADMM with variance reduction for the problem (1), based on different reduced variance stochastic gradients.  ... 
arXiv:1610.02758v5 fatcat:33stxzakzfgqfpl47s2bka3k7y

Momentum Schemes with Stochastic Variance Reduction for Nonconvex Composite Optimization [article]

Yi Zhou, Zhe Wang, Kaiyi Ji, Yingbin Liang, Vahid Tarokh
2019 arXiv   pre-print
Two new stochastic variance-reduced algorithms named SARAH and SPIDER have been recently proposed, and SPIDER has been shown to achieve a near-optimal gradient oracle complexity for nonconvex optimization  ...  However, existing momentum schemes used in variance-reduced algorithms are designed specifically for convex optimization, and are not applicable to nonconvex scenarios.  ...  ., 2018) , the authors proposed a nested stochastic variance reduction scheme for nonconvex optimization and achieve the same order-level oracle complexity result as that of SPIDER.  ... 
arXiv:1902.02715v3 fatcat:arhpvqvorngv3pqlktcslgkbbu

Zeroth-Order Stochastic Variance Reduction for Nonconvex Optimization [article]

Sijia Liu and Bhavya Kailkhura and Pin-Yu Chen and Paishun Ting and Shiyu Chang and Lisa Amini
2018 arXiv   pre-print
To mitigate this error, we propose two accelerated versions of ZO-SVRG utilizing variance reduced gradient estimators, which achieve the best rate known for ZO stochastic optimization (in terms of iterations  ...  As application demands for zeroth-order (gradient-free) optimization accelerate, the need for variance reduced and faster converging approaches is also intensifying.  ...  In [7] , an asynchronous ZO stochastic coordinate descent (ZO-SCD) was derived for parallel optimization and achieved the rate of O( √ d/ √ T ).  ... 
arXiv:1805.10367v2 fatcat:4vqqvotsbvahfdrwpjgdgwjyfy

Parallel Asynchronous Stochastic Variance Reduction for Nonconvex Optimization

Cong Fang, Zhouchen Lin
2017 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
We propose the Asynchronous Stochastic Variance Reduced Gradient (ASVRG) algorithm for nonconvex finite-sum problems.  ...  In this paper, we study Stochastic Variance Reduced Gradient (SVRG) in the asynchronous setting.  ...  In this paper, we study the asynchronous variant of SVRG for nonconvex optimization problems.  ... 
doi:10.1609/aaai.v31i1.10651 fatcat:o3kfopgbk5guhngxvlg3cbv7jm

SpiderBoost and Momentum: Faster Stochastic Variance Reduction Algorithms [article]

Zhe Wang, Kaiyi Ji, Yi Zhou, Yingbin Liang, Vahid Tarokh
2020 arXiv   pre-print
SARAH and SPIDER are two recently developed stochastic variance-reduced algorithms, and SPIDER has been shown to achieve a near-optimal first-order oracle complexity in smooth nonconvex optimization.  ...  We further develop a novel momentum scheme to accelerate SpiderBoost for composite optimization, which achieves the near-optimal oracle complexity in theory and substantial improvement in experiments.  ...  Related Work Stochastic algorithms for smooth nonconvex optimization: The convergence analysis for SGD was studied in [11] for smooth nonconvex optimization.  ... 
arXiv:1810.10690v3 fatcat:wsqob5keurbihg6x7ee3lwfbby

Stochastic Nested Variance Reduced Gradient Descent for Nonconvex Optimization

Dongruo Zhou, Pan Xu, Quanquan Gu
2018 Neural Information Processing Systems  
We propose a new stochastic gradient descent algorithm based on nested variance reduction.  ...  We study finite-sum nonconvex optimization problems, where the objective function is an average of n nonconvex functions.  ...  We also thank AWS for providing cloud computing credits associated with the NSF BIGDATA award.  ... 
dblp:conf/nips/ZhouXG18 fatcat:4lfmbo2c6vbwjpvdurtnlfof5e

Faster Gradient-Free Proximal Stochastic Methods for Nonconvex Nonsmooth Optimization

Feihu Huang, Bin Gu, Zhouyuan Huo, Songcan Chen, Heng Huang
2019 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
To fill this gap, in the paper, we propose a class of faster zeroth-order proximal stochastic methods with the variance reduction techniques of SVRG and SAGA, which are denoted as ZO-ProxSVRG and ZO-ProxSAGA  ...  However, its convergence rate is O(1/√T) for the nonconvex problems, which is significantly slower than the best convergence rate O(T1) of the zerothorder stochastic algorithm, where T is the iteration  ...  To accelerate optimization, more recently, Liu et al. (2018c,a) proposed the zerothorder stochastic variance reduction gradient (ZO-SVRG) methods.  ... 
doi:10.1609/aaai.v33i01.33011503 fatcat:jaudjs4vobbo3fitemtbzkqvdu

Stochastic Variance-Reduced ADMM [article]

Shuai Zheng, James T. Kwok
2016 arXiv   pre-print
Recently, stochastic ADMM has been integrated with variance reduction methods for stochastic gradient, leading to SAG-ADMM and SDCA-ADMM that have fast convergence rates and low iteration complexities.  ...  We also extend the proposed method for nonconvex problems, and obtain a convergence rate of O(1/T).  ...  As can be seen, stochastic ADMM methods with variance reduction (SVRG-ADMM, SAG-ADMM and SDCA-ADMM) have fast convergence, while those that do not use variance reduction are much slower.  ... 
arXiv:1604.07070v3 fatcat:i7eqkowjybbr5pxkkvmtg6czxm

Asynchronous Stochastic Proximal Methods for Nonconvex Nonsmooth Optimization [article]

Rui Zhu, Di Niu, Zongpeng Li
2018 arXiv   pre-print
Furthermore, our results are also the first to show the convergence of any stochastic proximal methods without assuming an increasing batch size or the use of additional variance reduction techniques.  ...  We study stochastic algorithms for solving nonconvex optimization problems with a convex yet possibly nonsmooth regularizer, which find wide applications in many practical machine learning applications  ...  (Reddi et al., 2016) prove convergence for nonconvex problems under a constant minibatch size, yet relying on additional mechanisms for variance reduction.  ... 
arXiv:1802.08880v3 fatcat:othkev23cjbo7kbnqmm7i2ux4y

Stochastic Frank-Wolfe Methods for Nonconvex Optimization [article]

Sashank J. Reddi, Suvrit Sra, Barnabas Poczos, Alex Smola
2016 arXiv   pre-print
For objective functions that decompose into a finite-sum, we leverage ideas from variance reduction techniques for convex optimization to obtain new variance reduced nonconvex Frank-Wolfe methods that  ...  We study Frank-Wolfe methods for nonconvex stochastic and finite-sum optimization problems.  ...  Variance Reduction in Stochastic Setting In this section, we improve the convergence rates in the stochastic setting using variance reduction techniques.  ... 
arXiv:1607.08254v2 fatcat:ooe2ra5hsbafvljkr6f4rp7zfi

Accelerating Stochastic Gradient Descent using Predictive Variance Reduction

Rie Johnson, Tong Zhang
2013 Neural Information Processing Systems  
To remedy this problem, we introduce an explicit variance reduction method for stochastic gradient descent which we call stochastic variance reduced gradient (SVRG).  ...  Stochastic gradient descent is popular for large scale optimization but has slow convergence asymptotically due to the inherent variance.  ...  Acknowledgment We thank Leon Bottou and Alekh Agarwal for spotting a mistake in the original theorem.  ... 
dblp:conf/nips/Johnson013 fatcat:ocmuty6ydbdcthtwmg3zftysvy

R-SPIDER: A Fast Riemannian Stochastic Optimization Algorithm with Curvature Independent Rate [article]

Jingzhao Zhang, Hongyi Zhang, Suvrit Sra
2018 arXiv   pre-print
We study smooth stochastic optimization problems on Riemannian manifolds.  ...  Unlike previous works, by not resorting to bounding iterate distances, our analysis yields curvature independent convergence rates for both the nonconvex and strongly convex cases.  ...  We analyze R-SPIDER for optimizing geodesically smooth stochastic nonconvex functions.  ... 
arXiv:1811.04194v3 fatcat:fjf5cvjym5ab7nr5m433cpzzyy

Stochastic Recursive Gradient Descent Ascent for Stochastic Nonconvex-Strongly-Concave Minimax Problems [article]

Luo Luo, Haishan Ye, Zhichao Huang, Tong Zhang
2020 arXiv   pre-print
In this paper, we propose a novel method called Stochastic Recursive gradiEnt Descent Ascent (SREDA), which estimates gradients more efficiently using variance reduction.  ...  This method achieves the best known stochastic gradient complexity of 𝒪(κ^3ε^-3), and its dependency on ε is optimal for this problem.  ...  Stochastic primal-dual coordinate method for regularized empirical risk minimization. Dongruo Zhou, Pan Xu, and Quanquan Gu. Stochastic nested variance reduction for nonconvex optimization.  ... 
arXiv:2001.03724v2 fatcat:rg2yygen7fg5jg6udqb7ru2zji
« Previous Showing results 1 — 15 out of 3,651 results