Filters








9 Hits in 7.7 sec

"Convex Until Proven Guilty": Dimension-Free Acceleration of Gradient Descent on Non-Convex Functions [article]

Yair Carmon, Oliver Hinder, John C. Duchi, Aaron Sidford
2017 arXiv   pre-print
We develop and analyze a variant of Nesterov's accelerated gradient descent (AGD) for minimization of smooth non-convex functions.  ...  This non-convexity certificate allows us to exploit negative curvature and obtain deterministic, dimension-free acceleration of convergence for non-convex functions.  ...  Our contributions "Convex until proven guilty" Underpinning our results is the observation that when we run Nesterov's accelerated gradient descent (AGD) on any smooth function f , one of two outcomes  ... 
arXiv:1705.02766v1 fatcat:rengq7gfwvd4nobq5ewjpan5u4

Cutting plane methods can be extended into nonconvex optimization [article]

Oliver Hinder
2019 arXiv   pre-print
Our techniques utilize the convex until proven guilty principle proposed by Carmon, Duchi, Hinder, and Sidford (2017).  ...  This improves on the best-known epsilon dependence achieved by cubic regularized Newton of O(ϵ^-3/2) as proved by Nesterov and Polyak (2006).  ...  [6] , specifically the 'convex until proven guilty principle'.  ... 
arXiv:1805.08370v4 fatcat:p7sfjapwfvh6xeormg3kztryny

Lower Bounds for Finding Stationary Points II: First-Order Methods [article]

Yair Carmon, John C. Duchi, Oliver Hinder, Aaron Sidford
2017 arXiv   pre-print
We establish lower bounds on the complexity of finding ϵ-stationary points of smooth, non-convex high-dimensional functions using first-order methods.  ...  For convex functions with Lipschitz gradient, accelerated gradient descent achieves the rate ϵ^-11/ϵ, showing that finding stationary points is easier given convexity.  ...  Near-achievability of the lower bounds In the paper [9] , we propose the method "convex until proven guilty," which augments Nesterov's accelerated gradient method with implicit negative curvature descent  ... 
arXiv:1711.00841v1 fatcat:pih7vinkybcxtggmbmzqwibxle

Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron [article]

Sharan Vaswani, Francis Bach, Mark Schmidt
2019 arXiv   pre-print
Under this condition, we prove that constant step-size stochastic gradient descent (SGD) with Nesterov acceleration matches the convergence rate of the deterministic accelerated method for both convex  ...  We also show that this condition implies that SGD can find a first-order stationary point as efficiently as full gradient descent in non-convex settings.  ...  Convex until proven guilty: Dimension-free acceleration of gradient descent on non-convex functions. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 654-663.  ... 
arXiv:1810.07288v3 fatcat:mpzdurk4h5hoblnnn2zygex4uu

First-Order Algorithms Without Lipschitz Gradient: A Sequential Local Optimization Approach [article]

Junyu Zhang, Mingyi Hong
2020 arXiv   pre-print
First-order algorithms have been popular for solving convex and non-convex optimization problems.  ...  We show that the proposed framework can easily adapt to existing first-order methods such as gradient descent (GD), normalized gradient descent (NGD), accelerated gradient descent (AGD), as well as GD  ...  In this section, we develop another subroutine which adapts the acceleration technique convex until proven guilty (CUPG) recently developed in [6] .  ... 
arXiv:2010.03194v1 fatcat:l2wur5kefndktezvjjguu2jlke

Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent [article]

Chi Jin, Praneeth Netrapalli, Michael I. Jordan
2017 arXiv   pre-print
Nesterov's accelerated gradient descent (AGD), an instance of the general family of "momentum methods", provably achieves faster convergence rate than gradient descent (GD) in the convex setting.  ...  Our analysis is based on two key ideas: (1) the use of a simple Hamiltonian function, inspired by a continuous-time perspective, which AGD monotonically decreases per step even for nonconvex functions,  ...  Convex until Proven Guilty: Dimension-free acceleration of gradient descent on non-convex functions. arXiv preprint arXiv:1705.02766, 2017. Coralia Cartis, Nicholas Gould, and Ph L Toint.  ... 
arXiv:1711.10456v1 fatcat:pkiddxkz6nfwzenzxqqptcrluu

Last-iterate convergence rates for min-max optimization [article]

Jacob Abernethy, Kevin A. Lai, Andre Wibisono
2019 arXiv   pre-print
In this work, we show that the Hamiltonian Gradient Descent (HGD) algorithm achieves linear convergence in a variety of more general settings, including convex-concave problems that satisfy a "sufficiently  ...  Proving last-iterate convergence is challenging because many natural algorithms, such as Simultaneous Gradient Descent/Ascent, provably diverge or cycle even in simple convex-concave min-max settings,  ...  "Convex until proven guilty": Dimension-free acceleration of gradient descent on non-convex functions. In International Conference on Machine Learning (ICML), 2017. [DH19] Simon S Du and Wei Hu.  ... 
arXiv:1906.02027v3 fatcat:sfbwjzoutzdcbgum5obh6iuj2m

An accelerated first-order method for non-convex optimization on manifolds [article]

Christopher Criscitiello, Nicolas Boumal
2021 arXiv   pre-print
We describe the first gradient methods on Riemannian manifolds to achieve accelerated rates in the non-convex case.  ...  Under Lipschitz assumptions on the Riemannian gradient and Hessian of the cost function, these methods find approximate first-order critical points faster than regular gradient descent.  ...  This is the idea behind the "convex until proven guilty" paradigm developed by Carmon et al. (2017) and also exploited by Jin et al. (2018) .  ... 
arXiv:2008.02252v2 fatcat:zq75ihbovjb7tglhusu5plxbre

Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations [article]

Yossi Arjevani, Yair Carmon, John C. Duchi, Dylan J. Foster, Ayush Sekhari, Karthik Sridharan
2020 arXiv   pre-print
Together, these results characterize the complexity of non-convex stochastic optimization with second-order methods and beyond.  ...  We design an algorithm which finds an ϵ-approximate stationary point (with ∇ F(x)<ϵ) using O(ϵ^-3) stochastic gradient and Hessian-vector products, matching guarantees that were previously available only  ...  Convex until proven guilty: Dimension-free acceleration of gradient descent on non-convex functions. In Proceedings of the 34th International Conference on Machine Learning, pages 654–663, 2017.  ... 
arXiv:2006.13476v1 fatcat:ayf2gykjpfeldogyxjgmstraam