Rprop Using the Natural Gradient [chapter]

Christian Igel, Marc Toussaint, Wan Weishui
2005 Trends and Applications in Constructive Approximation  
Gradient-based optimization algorithms are the standard methods for adapting the weights of neural networks. The natural gradient gives the steepest descent direction based on a non-Euclidean, from a theoretical point of view more appropriate metric in the weight space. While the natural gradient has already proven to be advantageous for online learning, we explore its benefits for batch learning: We empirically compare Rprop (resilient backpropagation), one of the best performing first-order
more » ... rming first-order learning algorithms, using the Euclidean and the non-Euclidean metric, respectively. As batch steepest descent on the natural gradient is closely related to Levenberg-Marquardt optimization, we add this method to our comparison. It turns out that the Rprop algorithm can indeed profit from the natural gradient: the optimization speed measured in terms of weight updates can increase significantly compared to the original version. Rprop based on the non-Euclidean metric shows at least similar performance as Levenberg-Marquardt optimization on the two benchmark problems considered and appears to be slightly more robust. However, in Levenberg-Marquardt optimization and Rprop using the natural gradient computing a weight update requires cubic time and quadratic space. Further, both methods have additional hyperparameters that are difficult to adjust. In contrast, conventional Rprop has linear space and time complexity, and its hyperparameters need no difficult tuning.
doi:10.1007/3-7643-7356-3_19 fatcat:34r54yj5mfgcvdn2ozsoujhp5q