36 Hits in 4.7 sec

Accelerating Symmetric Rank-1 Quasi-Newton Method with Nesterov's Gradient for Training Neural Networks

S. Indrapriyadarsini, Shahrzad Mahboubi, Hiroshi Ninomiya, Takeshi Kamio, Hideki Asai
2021 Algorithms  
Thus, this paper aims to investigate accelerating the Symmetric Rank-1 (SR1) quasi-Newton method with the Nesterov's gradient for training neural networks, and to briefly discuss its convergence.  ...  The BFGS quasi-Newton method is the most commonly studied second order method for neural network training.  ...  We thus propose a new limited memory Nesterov's acclerated symmetric rank-1 method (L-SR1-N) for training neural networks.  ... 
doi:10.3390/a15010006 fatcat:lls5teygwfaxnkxhn5lqbgmeie

Implementation of a Modified Nesterov's Accelerated Quasi-Newton Method on Tensorflow

S. Indrapriyadarsini, Shahrzad Mahboubi, Hiroshi Ninomiya, Hideki Asai
2018 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA)  
Recent studies incorporate Nesterov's accelerated gradient method for the acceleration of gradient based training.  ...  The Nesterov's Accelerated Quasi-Newton (NAQ) method has shown to drastically improve the convergence speed compared to the conventional quasi-Newton method.  ...  Zhang at Carleton University, Canada, for his support of microwave circuit models.  ... 
doi:10.1109/icmla.2018.00185 dblp:conf/icmla/Indrapriyadarsini18 fatcat:tjovczhvj5hmvnf3rouy6m5u5y

A Robust Quasi-Newton Training with Adaptive Momentum for Microwave Circuit Models in Neural Networks

Shahrzad Mahboubi, Indrapriyadarsini Sendillkkumaar, Hiroshi Ninomiya, Hideki Asai
2020 Journal of Signal Processing  
In this paper, we describe a robust technique based on the quasi-Newton method (QN) using an adaptive momentum term to train of neural networks.  ...  QN-based algorithms are commonly used for these purposes. Nesterov's accelerated quasi-Newton method (NAQ) proposed a way to accelerate of the QN using a fixed momentum coefficient.  ...  Zhang at Carleton University, Canada, for providing of microwave circuit models. This work was supported by Japan Society for the Promotion of Science (JSPS), KAKENHI (17K00350).  ... 
doi:10.2299/jsp.24.11 fatcat:hubaiivem5g4xcyt7tk6inplx4

Momentum acceleration of quasi-Newton based optimization technique for neural network training

Shahrzad Mahboubi, Indrapriyadarsini S, Hiroshi Ninomiya, Hideki Asai
2021 Nonlinear Theory and Its Applications IEICE  
This paper describes a momentum acceleration technique for quasi-Newton (QN) based neural network training and verifies its performance and computational complexity.  ...  Recently, Nesterov's accelerated quasi-Newton method (NAQ) has been introduced and shown that the momentum term is effective in reducing the number of iterations and the total training time by incorporating  ...  Zhang at Carleton University, Canada, for his support on the microwave circuit models.  ... 
doi:10.1587/nolta.12.554 fatcat:mvvks7eci5gg3pehrgmlaszgge

Online Regularized Nonlinear Acceleration [article]

Damien Scieur, Edouard Oyallon, Alexandre d'Aspremont, Francis Bach
2019 arXiv   pre-print
The new scheme provably improves the rate of convergence of fixed step gradient descent, and its empirical performance is comparable to that of quasi-Newton methods.  ...  However, RNA cannot accelerate faster multistep algorithms like Nesterov's method and often diverges in this context.  ...  Edouard Oyallon was partially supported by a postdoctoral grant from DPEI of Inria (AAR 2017POD057) for the collaboration with CWI.  ... 
arXiv:1805.09639v2 fatcat:xzysmjgsrjafvhovqwuymdk7ga

Preconditioned Stochastic Gradient Descent

Xi-Lin Li
2018 IEEE Transactions on Neural Networks and Learning Systems  
in a way comparable to Newton method for deterministic optimization.  ...  network or a recurrent neural network requiring extremely long term memories.  ...  In neural network training, a number of specialized methods are developed to improve the convergence of SGD, and to name a few, the classic momentum method and Nesterov's accelerated gradient, the RMSProp  ... 
doi:10.1109/tnnls.2017.2672978 pmid:28362591 fatcat:j3woq662tvfyfmdrjdoxjz65p4

Optimization for deep learning: theory and algorithms [article]

Ruoyu Sun
2019 arXiv   pre-print
Second, we review generic optimization methods used in training neural networks, such as SGD, adaptive gradient methods and distributed methods, and theoretical results for these algorithms.  ...  When and why can a neural network be successfully trained? This article provides an overview of optimization algorithms and theory for training neural networks.  ...  Srikant, Tian Ding and Dawei Li for discussions on various results reviewed in this article.  ... 
arXiv:1912.08957v1 fatcat:bdtx2o3qhfhthh2vyohkuwnxxa

Optimization Methods for Supervised Machine Learning: From Linear Models to Deep Learning [article]

Frank E. Curtis, Katya Scheinberg
2017 arXiv   pre-print
The latter half of the tutorial focuses on optimization algorithms, first for convex logistic regression, for which we discuss the use of first-order methods, the stochastic gradient method, variance reducing  ...  We then discuss some of the distinctive features of these optimization problems, focusing on the examples of logistic regression and the training of deep neural networks.  ...  It is shown in [80] that Nesterov's accelerated gradient method [60] can be cast as a classical momentum approach.  ... 
arXiv:1706.10207v1 fatcat:mezejqzn3bgozjhgpafyick3xy

Learning to Optimize: A Primer and A Benchmark [article]

Tianlong Chen, Xiaohan Chen, Wuyang Chen, Howard Heaton, Jialin Liu, Zhangyang Wang, Wotao Yin
2021 arXiv   pre-print
It automates the design of an optimization method based on its performance on a set of training problems.  ...  This data-driven procedure generates methods that can efficiently solve problems similar to those in the training.  ...  These methods accelerate the design iterations of many types of ML algorithms, such as random forests, gradient boosting, and neural networks.  ... 
arXiv:2103.12828v2 fatcat:c75y3wz6cngirb2zpugjk63ymq

A Mini-Block Natural Gradient Method for Deep Neural Networks [article]

Achraf Bahamou, Donald Goldfarb, Yi Ren
2022 arXiv   pre-print
The training of deep neural networks (DNNs) is currently predominantly done using first-order methods.  ...  Recently, effective second-order methods, such as KFAC, K-BFGS, Shampoo, and TNT, have been developed for training DNNs, by preconditioning the stochastic gradient by layer-wise block-diagonal matrices  ...  SGD with momentum (SGD-m) (Polyak, 1964) and stochastic versions of Nesterov's accelerated gradient method (Nesterov, 1998) , implicitly make use of curvature by choosing step directions that combine  ... 
arXiv:2202.04124v2 fatcat:cdlkkbn5dbethpf7qmzd7tklvy

Recent Theoretical Advances in Non-Convex Optimization [article]

Marina Danilova, Pavel Dvurechensky, Alexander Gasnikov, Eduard Gorbunov, Sergey Guminov, Dmitry Kamzolov, Innokentiy Shibaev
2021 arXiv   pre-print
Motivated by recent increased interest in optimization algorithms for non-convex optimization in application to training deep neural networks and other optimization problems in data analysis, we give an  ...  gradient schemes, and an overview of the stochastic first-order methods.  ...  Scheinberg for fruitful discussions and their suggestions which helped to improve the quality of the text.  ... 
arXiv:2012.06188v3 fatcat:6cwwns3pnba5zbodlhddof6xai

High Dimensional Optimization through the Lens of Machine Learning [article]

Felix Benning
2021 arXiv   pre-print
With this theoretical foundation for stochastic gradient descent and momentum methods, we try to explain why the methods used commonly in the machine learning field are so successful.  ...  This thesis reviews numerical optimization methods with machine learning problems in mind.  ...  We remark at the outset that many authors [119, 99] propose quasi- natural-gradient methods that are strikingly similar to the quasi-Newton 118 CHAPTER 4.  ... 
arXiv:2112.15392v1 fatcat:4v4s7z3jyrb6dlhwbgd3mcpwyi

Deep Sparse Coding Using Optimized Linear Expansion of Thresholds [article]

Debabrata Mahapatra, Subhadip Mukherjee, Chandra Sekhar Seelamantula
2017 arXiv   pre-print
We address the problem of reconstructing sparse signals from noisy and compressive measurements using a feed-forward deep neural network (DNN) with an architecture motivated by the iterative shrinkage-thresholding  ...  For training, we develop an efficient second-order algorithm, which requires only matrix-vector product computations in every training epoch (Hessian-free optimization) and offers superior convergence  ...  Thierry Blu, Chinese University of Hong Kong, for his insights on the LET representation and feedback on the manuscript.  ... 
arXiv:1705.07290v1 fatcat:vfjz4rpxdnfofcimg4u4iyidli

Explicit Convergence Rates of Greedy and Random Quasi-Newton Methods [article]

Dachao Lin, Haishan Ye, Zhihua Zhang
2022 arXiv   pre-print
First, we extend Rodomanov and Nesterov's results to random quasi-Newton methods, which include common DFP, BFGS, SR1 methods.  ...  Such random methods adopt a random direction for updating the approximate Hessian matrix in each iteration. Second, we focus on the specific quasi-Newton methods: SR1 and BFGS methods.  ...  In addition, we also compare the running time of each method with a classical first-order method: accelerated gradient descent (AGD) following [23] .  ... 
arXiv:2104.08764v4 fatcat:qe3fdw7jmzb5topabv7iqop2q4

Stochastic, Distributed and Federated Optimization for Machine Learning [article]

Jakub Konečný
2017 arXiv   pre-print
First, we propose novel variants of stochastic gradient descent with a variance reduction property that enables linear convergence for strongly convex objectives.  ...  In this case, traditional methods are inefficient, as the communication costs inherent in distributed optimization become the bottleneck.  ...  Stochastic Quasi-Newton Methods. A third class of new algorithms are the Stochastic quasi-Newton methods [26, 17] .  ... 
arXiv:1707.01155v1 fatcat:t6uqrmnssrafze6l6c7gk5vcyu
« Previous Showing results 1 — 15 out of 36 results