Using Curvature Information for Fast Stochastic Search

Genevieve B. Orr, Todd K. Leen
1996 Neural Information Processing Systems  
We present an algorithm for fast stochastic gradient descent that uses a nonlinear adaptive momentum scheme to optimize the late time convergence rate. The algorithm makes effective use of curvature information, requires only O(n) storage and computation, and delivers convergence rates close to the theoretical optimum. We demonstrate the technique on linear and large nonlinear backprop networks.
dblp:conf/nips/OrrL96 fatcat:uiwsuxbydnhgjffwdtz37icxru