Filters








2,638 Hits in 2.5 sec

Escaping Saddles with Stochastic Gradients [article]

Hadi Daneshmand, Jonas Kohler, Aurelien Lucchi, Thomas Hofmann
2018 arXiv   pre-print
Based upon this observation we propose a new assumption under which we show that the injection of explicit, isotropic noise usually applied to make gradient descent escape saddle points can successfully  ...  We analyze the variance of stochastic gradients along negative curvature directions in certain non-convex machine learning models and show that stochastic gradients exhibit a strong component along these  ...  Experiments In this Section we first show that vanilla SGD (Algorithm 2) as well as GD with a stochastic gradient step as perturbation (Algorithm 1) indeed escape saddle points.  ... 
arXiv:1803.05999v2 fatcat:ww74scxrovbgnpskbyys7g4veq

Escaping Saddle Points with Stochastically Controlled Stochastic Gradient Methods [article]

Guannan Liang, Qianqian Tong, Chunjiang Zhu, Jinbo Bi
2021 arXiv   pre-print
Stochastically controlled stochastic gradient (SCSG) methods have been proved to converge efficiently to first-order stationary points which, however, can be saddle points in nonconvex optimization.  ...  Simulation studies illustrate that the proposed algorithm can escape saddle points in much fewer epochs than the gradient descent methods perturbed by either noise injection or a SGD step.  ...  Stochastic gradient descent escapes saddle points efficiently. arXiv preprint arXiv:1902.04811, 2019. Rie Johnson and Tong Zhang.  ... 
arXiv:2103.04413v3 fatcat:7xfbarbp5zam5e6gfahrtng2aq

Adaptive Stochastic Gradient Langevin Dynamics: Taming Convergence and Saddle Point Escape Time [article]

Hejian Sang, Jia Liu
2018 arXiv   pre-print
All proposed algorithms can escape from saddle points with at most O( d) iterations, which is nearly dimension-free.  ...  In this paper, we propose a new adaptive stochastic gradient Langevin dynamics (ASGLD) algorithmic framework and its two specialized versions, namely adaptive stochastic gradient (ASG) and adaptive gradient  ...  full gradient information in identifying escape direction, which implies that they do not work with cases where only stochastic gradients are available.  ... 
arXiv:1805.09416v1 fatcat:5qbqlfpfnnfkth3ogmlwzahw4a

Linear Speedup in Saddle-Point Escape for Decentralized Non-Convex Optimization [article]

Stefan Vlaski, Ali H. Sayed
2019 arXiv   pre-print
We establish linear speedup in saddle-point escape time in the number of agents for symmetric combination policies and study the potential for further improvement by employing asymmetric combination weights  ...  Under appropriate cooperation protocols and parameter choices, fully decentralized solutions for stochastic optimization have been shown to match the performance of centralized solutions and result in  ...  Recently these results have been extended to decentralized optimization with deterministic gradients and random initialization [33] as well as stochastic gradients with diminishing step-size and decay-ing  ... 
arXiv:1910.13852v1 fatcat:ohvqqkggoneltf4kebtq5qex5q

A Realistic Example in 2 Dimension that Gradient Descent Takes Exponential Time to Escape Saddle Points [article]

Shiliang Zuo
2020 arXiv   pre-print
In this paper we show a negative result: gradient descent may take exponential time to escape saddle points, with non-pathological two dimensional functions.  ...  Through our analysis we demonstrate that stochasticity is essential to escape saddle points efficiently.  ...  In fact, it is fairly easy to construct some function with an artificial initialization scheme, such that gradient descent will take an exponential number of iterates to escape a saddle point.  ... 
arXiv:2008.07513v1 fatcat:o2wndqztqzbxtnk7mflrdr3you

Adai: Separating the Effects of Adaptive Learning Rate and Momentum Inertia [article]

Zeke Xie, Xinrui Wang, Huishuai Zhang, Issei Sato, Masashi Sugiyama
2021 arXiv   pre-print
We prove that Adaptive Learning Rate can escape saddle points efficiently, but cannot select flat minima as SGD does.  ...  Specifically, we disentangle the effects of Adaptive Learning Rate and Momentum of the Adam dynamics on saddle-point escaping and minima selection.  ...  Escaping saddles with stochastic gradients. In Interna- tional Conference on Machine Learning, pp. 1155-1164, 2018. Dauphin, Y.  ... 
arXiv:2006.15815v9 fatcat:gvbgk7wtd5dvxlpn4vcm2fgg64

Sharp Analysis for Nonconvex SGD Escaping from Saddle Points [article]

Cong Fang, Zhouchen Lin, Tong Zhang
2019 arXiv   pre-print
In this paper, we give a sharp analysis for Stochastic Gradient Descent (SGD) and prove that SGD is able to efficiently escape from saddle points and find an (ϵ, O(ϵ^0.5))-approximate second-order stationary  ...  point in Õ(ϵ^-3.5) stochastic gradient computations for generic nonconvex optimization problems, when the objective function satisfies gradient-Lipschitz, Hessian-Lipschitz, and dispersive noise assumptions  ...  Acknowledgement The authors would like to greatly thank Chris Junchi Li providing us with a proof of SGD to escape saddle points inÕ( −4 ) computational costs and carefully revising our paper.  ... 
arXiv:1902.00247v2 fatcat:q2olwny57revbl5z7vytcn5gfq

Saving Gradient and Negative Curvature Computations: Finding Local Minima More Efficiently [article]

Yaodong Yu and Difan Zou and Quanquan Gu
2017 arXiv   pre-print
We propose a family of nonconvex optimization algorithms that are able to save gradient and negative curvature computations to a large extent, and are guaranteed to find an approximate local minimum with  ...  Our novel analysis shows that the proposed algorithms can escape the small gradient region in only one negative curvature descent step whenever they enter it, and thus they only need to perform at most  ...  As we will prove later in this section, one can use Algorithm 1 to escape a saddle point x with λ min (∇ 2 f (x)) < − H .  ... 
arXiv:1712.03950v1 fatcat:nv3yjvunxjbczn4xw6hyrx42lu

Gradient Descent Can Take Exponential Time to Escape Saddle Points [article]

Simon S. Du, Chi Jin, Jason D. Lee, Michael I. Jordan, Barnabas Poczos, Aarti Singh
2017 arXiv   pre-print
Although gradient descent (GD) almost always escapes saddle points asymptotically [Lee et al., 2016], this paper shows that even with fairly natural random initialization schemes and non-pathological functions  ...  On the other hand, gradient descent with perturbations [Ge et al., 2015, Jin et al., 2017] is not slowed down by saddle points - it can find an approximate local minimizer in polynomial time.  ...  In particular, we expect that with random initialization, general stochastic gradient descent will need exponential time to escape saddle points in the worst case.  ... 
arXiv:1705.10412v2 fatcat:ekjhh5ccrnhwncju7gmdrrj2gu

On the diffusion approximation of nonconvex stochastic gradient descent [article]

Wenqing Hu, Chris Junchi Li, Lei Li, Jian-Guo Liu
2018 arXiv   pre-print
We study the Stochastic Gradient Descent (SGD) method in nonconvex optimization problems from the point of view of approximating diffusion processes.  ...  ~saddle point): it escapes in a number of iterations exponentially (resp.~almost linearly) dependent on the inverse stepsize.  ...  A On weak approximation of diffusion process to stochastic gradient descent In the main text, we have considered the stochastic gradient descent (SGD) iteration x (k) = x (k−1) − η∇f (x (k−1) , ζ k ) ,  ... 
arXiv:1705.07562v2 fatcat:wvsq22vh6rgjzpt76aiqzxpv6q

A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient Descent Exponentially Favors Flat Minima [article]

Zeke Xie, Issei Sato, Masashi Sugiyama
2021 arXiv   pre-print
more than sharp minima, while Gradient Descent (GD) with injected white noise favors flat minima only polynomially more than sharp minima.  ...  Stochastic Gradient Descent (SGD) and its variants are mainstream methods for training deep networks in practice. SGD is known to find a flat minimum that often generalizes well.  ...  Stochastic Gradient Noise Analysis.  ... 
arXiv:2002.03495v14 fatcat:tbavmri37jciziarjz2ybnnt5m

Escaping Saddle-Points Faster under Interpolation-like Conditions [article]

Abhishek Roy, Krishnakumar Balasubramanian, Saeed Ghadimi, Prasant Mohapatra
2020 arXiv   pre-print
In this paper, we show that under over-parametrization several standard stochastic optimization algorithms escape saddle-points and converge to local-minimizers much faster.  ...  We show that, under interpolation-like assumptions satisfied by the stochastic gradients in an over-parametrization setting, the first-order oracle complexity of Perturbed Stochastic Gradient Descent (  ...  But then it is shown that the stuck region is narrow enough so that the iterates escape the saddle points with high probability. We now require a condition on the tail of the stochastic gradient.  ... 
arXiv:2009.13016v1 fatcat:z5j64p3jxvamdipaaeqptkclfi

Second-Order Guarantees in Centralized, Federated and Decentralized Nonconvex Optimization [article]

Stefan Vlaski, Ali H. Sayed
2020 arXiv   pre-print
A key insight in these analyses is that gradient perturbations play a critical role in allowing local descent algorithms to efficiently distinguish desirable from undesirable stationary points and escape  ...  descent and its variations, perform well in converging towards local minima and avoiding saddle-points.  ...  The authors are with the Institute of Electrical Engineering, École Polytechnique Fédérale de Lausanne. Emails: {stefan.vlaski, ali.sayed}@epfl.ch.  ... 
arXiv:2003.14366v1 fatcat:42vsyhewprcaln2j7365ehb4zi

On Nonconvex Optimization for Machine Learning: Gradients, Stochasticity, and Saddle Points [article]

Chi Jin, Praneeth Netrapalli, Rong Ge, Sham M. Kakade, Michael I. Jordan
2019 arXiv   pre-print
Gradient descent (GD) and stochastic gradient descent (SGD) are the workhorses of large-scale machine learning.  ...  But these analyses do not take into account the possibility of converging to saddle points.  ...  Stochastic setting with Lipschitz gradient.  ... 
arXiv:1902.04811v2 fatcat:rmdh2zan2vhdxbbzi6if2krnwe

Second-Order Guarantees in Federated Learning [article]

Stefan Vlaski, Elsa Rizk, Ali H. Sayed
2020 arXiv   pre-print
We draw on recent results on the second-order optimality of stochastic gradient algorithms in centralized and decentralized settings, and establish second-order guarantees for a class of federated learning  ...  Nevertheless, most existing analysis are either limited to convex loss functions, or only establish first-order stationarity, despite the fact that saddle-points, which are first-order stationary, are  ...  All implementations escape the saddle-point and find a local minimum.  ... 
arXiv:2012.01474v1 fatcat:eyxwyialxbcg7dtmywyhnyccgu
« Previous Showing results 1 — 15 out of 2,638 results