A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks
2020
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence
Adaptive gradient methods, which adopt historical gradient information to automatically adjust the learning rate, despite the nice property of fast convergence, have been observed to generalize worse than stochastic gradient descent (SGD) with momentum in training deep neural networks. This leaves how to close the generalization gap of adaptive gradient methods an open problem. In this work, we show that adaptive gradient methods such as Adam, Amsgrad, are sometimes "over adapted". We design a
doi:10.24963/ijcai.2020/448
dblp:conf/ijcai/ZhangCS20
fatcat:lo7lunpacnbppirpuu3l2zp3ha