A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Filters
Accelerating Symmetric Rank-1 Quasi-Newton Method with Nesterov's Gradient for Training Neural Networks
2021
Algorithms
Thus, this paper aims to investigate accelerating the Symmetric Rank-1 (SR1) quasi-Newton method with the Nesterov's gradient for training neural networks, and to briefly discuss its convergence. ...
The BFGS quasi-Newton method is the most commonly studied second order method for neural network training. ...
We thus propose a new limited memory Nesterov's acclerated symmetric rank-1 method (L-SR1-N) for training neural networks. ...
doi:10.3390/a15010006
fatcat:lls5teygwfaxnkxhn5lqbgmeie
Implementation of a Modified Nesterov's Accelerated Quasi-Newton Method on Tensorflow
2018
2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA)
Recent studies incorporate Nesterov's accelerated gradient method for the acceleration of gradient based training. ...
The Nesterov's Accelerated Quasi-Newton (NAQ) method has shown to drastically improve the convergence speed compared to the conventional quasi-Newton method. ...
Zhang at Carleton University, Canada, for his support of microwave circuit models. ...
doi:10.1109/icmla.2018.00185
dblp:conf/icmla/Indrapriyadarsini18
fatcat:tjovczhvj5hmvnf3rouy6m5u5y
A Robust Quasi-Newton Training with Adaptive Momentum for Microwave Circuit Models in Neural Networks
2020
Journal of Signal Processing
In this paper, we describe a robust technique based on the quasi-Newton method (QN) using an adaptive momentum term to train of neural networks. ...
QN-based algorithms are commonly used for these purposes. Nesterov's accelerated quasi-Newton method (NAQ) proposed a way to accelerate of the QN using a fixed momentum coefficient. ...
Zhang at Carleton University, Canada, for providing of microwave circuit models. This work was supported by Japan Society for the Promotion of Science (JSPS), KAKENHI (17K00350). ...
doi:10.2299/jsp.24.11
fatcat:hubaiivem5g4xcyt7tk6inplx4
Momentum acceleration of quasi-Newton based optimization technique for neural network training
2021
Nonlinear Theory and Its Applications IEICE
This paper describes a momentum acceleration technique for quasi-Newton (QN) based neural network training and verifies its performance and computational complexity. ...
Recently, Nesterov's accelerated quasi-Newton method (NAQ) has been introduced and shown that the momentum term is effective in reducing the number of iterations and the total training time by incorporating ...
Zhang at Carleton University, Canada, for his support on the microwave circuit models. ...
doi:10.1587/nolta.12.554
fatcat:mvvks7eci5gg3pehrgmlaszgge
Online Regularized Nonlinear Acceleration
[article]
2019
arXiv
pre-print
The new scheme provably improves the rate of convergence of fixed step gradient descent, and its empirical performance is comparable to that of quasi-Newton methods. ...
However, RNA cannot accelerate faster multistep algorithms like Nesterov's method and often diverges in this context. ...
Edouard Oyallon was partially supported by a postdoctoral grant from DPEI of Inria (AAR 2017POD057) for the collaboration with CWI. ...
arXiv:1805.09639v2
fatcat:xzysmjgsrjafvhovqwuymdk7ga
Preconditioned Stochastic Gradient Descent
2018
IEEE Transactions on Neural Networks and Learning Systems
in a way comparable to Newton method for deterministic optimization. ...
network or a recurrent neural network requiring extremely long term memories. ...
In neural network training, a number of specialized methods are developed to improve the convergence of SGD, and to name a few, the classic momentum method and Nesterov's accelerated gradient, the RMSProp ...
doi:10.1109/tnnls.2017.2672978
pmid:28362591
fatcat:j3woq662tvfyfmdrjdoxjz65p4
Optimization for deep learning: theory and algorithms
[article]
2019
arXiv
pre-print
Second, we review generic optimization methods used in training neural networks, such as SGD, adaptive gradient methods and distributed methods, and theoretical results for these algorithms. ...
When and why can a neural network be successfully trained? This article provides an overview of optimization algorithms and theory for training neural networks. ...
Srikant, Tian Ding and Dawei Li for discussions on various results reviewed in this article. ...
arXiv:1912.08957v1
fatcat:bdtx2o3qhfhthh2vyohkuwnxxa
Optimization Methods for Supervised Machine Learning: From Linear Models to Deep Learning
[article]
2017
arXiv
pre-print
The latter half of the tutorial focuses on optimization algorithms, first for convex logistic regression, for which we discuss the use of first-order methods, the stochastic gradient method, variance reducing ...
We then discuss some of the distinctive features of these optimization problems, focusing on the examples of logistic regression and the training of deep neural networks. ...
It is shown in [80] that Nesterov's accelerated gradient method [60] can be cast as a classical momentum approach. ...
arXiv:1706.10207v1
fatcat:mezejqzn3bgozjhgpafyick3xy
Learning to Optimize: A Primer and A Benchmark
[article]
2021
arXiv
pre-print
It automates the design of an optimization method based on its performance on a set of training problems. ...
This data-driven procedure generates methods that can efficiently solve problems similar to those in the training. ...
These methods accelerate the design iterations of many types of ML algorithms, such as random forests, gradient boosting, and neural networks. ...
arXiv:2103.12828v2
fatcat:c75y3wz6cngirb2zpugjk63ymq
A Mini-Block Natural Gradient Method for Deep Neural Networks
[article]
2022
arXiv
pre-print
The training of deep neural networks (DNNs) is currently predominantly done using first-order methods. ...
Recently, effective second-order methods, such as KFAC, K-BFGS, Shampoo, and TNT, have been developed for training DNNs, by preconditioning the stochastic gradient by layer-wise block-diagonal matrices ...
SGD with momentum (SGD-m) (Polyak, 1964) and stochastic versions of Nesterov's accelerated gradient method (Nesterov, 1998) , implicitly make use of curvature by choosing step directions that combine ...
arXiv:2202.04124v2
fatcat:cdlkkbn5dbethpf7qmzd7tklvy
Recent Theoretical Advances in Non-Convex Optimization
[article]
2021
arXiv
pre-print
Motivated by recent increased interest in optimization algorithms for non-convex optimization in application to training deep neural networks and other optimization problems in data analysis, we give an ...
gradient schemes, and an overview of the stochastic first-order methods. ...
Scheinberg for fruitful discussions and their suggestions which helped to improve the quality of the text. ...
arXiv:2012.06188v3
fatcat:6cwwns3pnba5zbodlhddof6xai
High Dimensional Optimization through the Lens of Machine Learning
[article]
2021
arXiv
pre-print
With this theoretical foundation for stochastic gradient descent and momentum methods, we try to explain why the methods used commonly in the machine learning field are so successful. ...
This thesis reviews numerical optimization methods with machine learning problems in mind. ...
We remark at the outset that many authors [119, 99] propose quasi-
natural-gradient methods that are strikingly similar to the quasi-Newton
118 CHAPTER 4. ...
arXiv:2112.15392v1
fatcat:4v4s7z3jyrb6dlhwbgd3mcpwyi
Deep Sparse Coding Using Optimized Linear Expansion of Thresholds
[article]
2017
arXiv
pre-print
We address the problem of reconstructing sparse signals from noisy and compressive measurements using a feed-forward deep neural network (DNN) with an architecture motivated by the iterative shrinkage-thresholding ...
For training, we develop an efficient second-order algorithm, which requires only matrix-vector product computations in every training epoch (Hessian-free optimization) and offers superior convergence ...
Thierry Blu, Chinese University of Hong Kong, for his insights on the LET representation and feedback on the manuscript. ...
arXiv:1705.07290v1
fatcat:vfjz4rpxdnfofcimg4u4iyidli
Explicit Convergence Rates of Greedy and Random Quasi-Newton Methods
[article]
2022
arXiv
pre-print
First, we extend Rodomanov and Nesterov's results to random quasi-Newton methods, which include common DFP, BFGS, SR1 methods. ...
Such random methods adopt a random direction for updating the approximate Hessian matrix in each iteration. Second, we focus on the specific quasi-Newton methods: SR1 and BFGS methods. ...
In addition, we also compare the running time of each method with a classical first-order method: accelerated gradient descent (AGD) following [23] . ...
arXiv:2104.08764v4
fatcat:qe3fdw7jmzb5topabv7iqop2q4
Stochastic, Distributed and Federated Optimization for Machine Learning
[article]
2017
arXiv
pre-print
First, we propose novel variants of stochastic gradient descent with a variance reduction property that enables linear convergence for strongly convex objectives. ...
In this case, traditional methods are inefficient, as the communication costs inherent in distributed optimization become the bottleneck. ...
Stochastic Quasi-Newton Methods. A third class of new algorithms are the Stochastic quasi-Newton methods [26, 17] . ...
arXiv:1707.01155v1
fatcat:t6uqrmnssrafze6l6c7gk5vcyu
« Previous
Showing results 1 — 15 out of 36 results