PA-GD: On the Convergence of Perturbed Alternating Gradient Descent to Second-Order Stationary Points for Structured Nonconvex Optimization

Songtao Lu, Mingyi Hong, Zhengdao Wang
2019 International Conference on Machine Learning  
Alternating gradient descent (A-GD) is a simple but popular algorithm in machine learning, which updates two blocks of variables in an alternating manner using gradient descent steps. In this paper, we consider a smooth unconstrained nonconvex optimization problem, and propose a perturbed A-GD (PA-GD) which is able to converge (with high probability) to the second-order stationary points (SOSPs) with a global sublinear rate. Existing analysis on A-GD type algorithm either only guarantees
more » ... ence to first-order solutions, or converges to second-order solutions asymptotically (without rates). To the best of our knowledge, this is the first alternating type algorithm that takes O(polylog(d)/ϵ 2 ) iterations to achieve an (ϵ, √ ϵ)-SOSP with high probability, where polylog(d) denotes the polynomial of the logarithm with respect to problem dimension d.
dblp:conf/icml/LuHW19 fatcat:qdlx3v6hx5bwfnvgecdhkxh25e