A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
AdaSGD: Bridging the gap between SGD and Adam
[article]
2020
arXiv
pre-print
In the context of stochastic gradient descent(SGD) and adaptive moment estimation (Adam),researchers have recently proposed optimization techniques that transition from Adam to SGD with the goal of improving both convergence and generalization performance. However, precisely how each approach trades off early progress and generalization is not well understood; thus, it is unclear when or even if, one should transition from one approach to the other. In this work, by first studying the convex
arXiv:2006.16541v1
fatcat:hd2jkgswlzbiffvd4t5xvf7fqy