"Convex Until Proven Guilty": Dimension-Free Acceleration of Gradient Descent on Non-Convex Functions [article]

Yair Carmon, Oliver Hinder, John C. Duchi, Aaron Sidford
2017 arXiv   pre-print
We develop and analyze a variant of Nesterov's accelerated gradient descent (AGD) for minimization of smooth non-convex functions. We prove that one of two cases occurs: either our AGD variant converges quickly, as if the function was convex, or we produce a certificate that the function is "guilty" of being non-convex. This non-convexity certificate allows us to exploit negative curvature and obtain deterministic, dimension-free acceleration of convergence for non-convex functions. For a
more » ... on f with Lipschitz continuous gradient and Hessian, we compute a point x with ∇ f(x)<ϵ in O(ϵ^-7/4(1/ ϵ) ) gradient and function evaluations. Assuming additionally that the third derivative is Lipschitz, we require only O(ϵ^-5/3(1/ ϵ) ) evaluations.
arXiv:1705.02766v1 fatcat:rengq7gfwvd4nobq5ewjpan5u4