Reducing computational costs in deep learning on almost linearly separable training data

I.M. Kulikovsvkikh
2020 Computer Optics  
Previous research in deep learning indicates that iterations of the gradient descent, over separable data converge toward the L2 maximum margin solution. Even in the absence of explicit regularization, the decision boundary still changes even if the classification error on training is equal to zero. This feature of the so-called "implicit regularization" allows gradient methods to use more aggressive learning rates that result in substantial computational savings. However, even if the gradient
more » ... en if the gradient descent method generalizes well, going toward the optimal solution, the rate of convergence to this solution is much slower than the rate of convergence of a loss function itself with a fixed step size. The present study puts forward the generalized logistic loss function that involves the optimization of hyperparameters, which results in a faster convergence rate while keeping the same regret bound as the gradient descent method. The results of computational experiments on MNIST and Fashion MNIST benchmark datasets for image classification proved the viability of the proposed approach to reducing computational costs and outlined directions for future research.
doi:10.18287/2412-6179-co-645 fatcat:bmbhlducl5fntldbmylqdtnway