A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is
Adaptive gradient methods are workhorses in deep learning. However, the convergence guarantees of adaptive gradient methods for nonconvex optimization have not been sufficiently studied. In this paper, we provide a sharp analysis of a recently proposed adaptive gradient method namely partially adaptive momentum estimation method (Padam) (Chen and Gu, 2018), which admits many existing adaptive gradient methods such as RMSProp and AMSGrad as special cases. Our analysis shows that, for smootharXiv:1808.05671v2 fatcat:k437chfxc5erxosw4or7p75djy