Filters








1,652 Hits in 2.9 sec

A Linearly-Convergent Stochastic L-BFGS Algorithm [article]

Philipp Moritz, Robert Nishihara, Michael I. Jordan
2016 arXiv   pre-print
We propose a new stochastic L-BFGS algorithm and prove a linear convergence rate for strongly convex and smooth functions.  ...  Our algorithm draws heavily from a recent stochastic variant of L-BFGS proposed in Byrd et al. (2014) as well as a recent approach to variance reduction for stochastic gradient descent from Johnson and  ...  Discussion This paper introduces a stochastic version of L-BFGS and proves a linear rate of convergence in the strongly convex case.  ... 
arXiv:1508.02087v2 fatcat:qffeae2oufgzzhxcmfa6jgknze

Quasi-Newton Methods: Superlinear Convergence Without Line Searches for Self-Concordant Functions [article]

Wenbo Gao, Donald Goldfarb
2018 arXiv   pre-print
We show that using this step size in the BFGS method (and quasi-Newton methods in the Broyden convex class other than the DFP method) results in superlinear convergence for strongly convex self-concordant  ...  of stochastic gradient descent on stochastic optimization problems.  ...  Then the adaptive L-BFGS method is globally R-linearly convergent.  ... 
arXiv:1612.06965v3 fatcat:5dt4s3uemvatpl7enjv43rdcsm

Quasi-Newton methods: superlinear convergence without line searches for self-concordant functions

Wenbo Gao, Donald Goldfarb
2018 Optimization Methods and Software  
We show that using this step size in the BFGS method (and quasi-Newton methods in the Broyden convex class other than the DFP method) results in superlinear convergence for strongly convex self-concordant  ...  of stochastic gradient descent on stochastic optimization problems.  ...  Then the adaptive L-BFGS method is globally R-linearly convergent.  ... 
doi:10.1080/10556788.2018.1510927 fatcat:xtmmqflsq5grplwt3guuysd6ty

Stochastic Damped L-BFGS with Controlled Norm of the Hessian Approximation [article]

Sanae Lotfi and Tiphaine Bonniot de Ruisselet and Dominique Orban and Andrea Lodi
2020 arXiv   pre-print
Our algorithm, VARCHEN, draws from previous work that proposed a novel stochastic damped L-BFGS algorithm called SdLBFGS.  ...  We propose a new stochastic variance-reduced damped L-BFGS algorithm, where we leverage estimates of bounds on the largest and smallest eigenvalues of the Hessian approximation to balance its quality and  ...  [33] proposed a stochastic damped L-BFGS (SdLBFGS) algorithm and proved almost sure convergence to a stationary point.  ... 
arXiv:2012.05783v1 fatcat:nwjtqfjjnbeqjcejbldgaul56i

Stochastic L-BFGS: Improved Convergence Rates and Practical Acceleration Strategies

Renbo Zhao, William Benjamin Haskell, Vincent Y. F. Tan
2018 IEEE Transactions on Signal Processing  
We revisit the stochastic limited-memory BFGS (L-BFGS) algorithm.  ...  By proposing a new framework for the convergence analysis, we prove improved convergence rates and computational complexities of the stochastic L-BFGS algorithms compared to previous works.  ...  In the literature (Gower et al., 2016; Moritz et al., 2016) , usually option I or II (in Algorithm 1) is analyzed theoretically to prove that the stochastic L-BFGS algorithms therein converge linearly  ... 
doi:10.1109/tsp.2017.2784360 fatcat:u7jv3vxpave7jkbxnmpm7gcp54

Fast and Furious Convergence: Stochastic Second Order Methods under Interpolation [article]

Si Yi Meng, Sharan Vaswani, Issam Laradji, Mark Schmidt, Simon Lacoste-Julien
2020 arXiv   pre-print
We empirically evaluate stochastic L-BFGS and a "Hessian-free" implementation of R-SSN for binary classification on synthetic, linearly-separable datasets and real datasets under a kernel mapping.  ...  Furthermore, we analyze stochastic BFGS algorithms in the interpolation setting and prove their global linear convergence.  ...  A linearly-convergent stochastic L-BFGS algorithm. In Artificial Intelligence and Statistics, pages 249- 258, 2016. Yurii Nesterov. Lectures on convex optimization, vol- ume 137. Springer, 2018.  ... 
arXiv:1910.04920v2 fatcat:jfzvxawxdrcp3ocfbnxmzh4fi4

Stochastic Block BFGS: Squeezing More Curvature out of Data [article]

Robert M. Gower, Donald Goldfarb, Peter Richtárik
2016 arXiv   pre-print
We propose a novel limited-memory stochastic block BFGS update for incorporating enriched curvature information in stochastic approximation methods.  ...  We propose several sketching strategies, present a new quasi-Newton method that uses stochastic block BFGS updates combined with the variance reduction approach SVRG to compute batch stochastic gradients  ...  Convergence In this section we prove that Algorithm 1 converges linearly.  ... 
arXiv:1603.09649v1 fatcat:7tduh5ikgnchzmcz4n6mvxgo4a

An Inexact Variable Metric Proximal Point Algorithm for Generic Quasi-Newton Acceleration [article]

Hongzhou Lin, Zaid Harchaoui
2019 arXiv   pre-print
When combined with limited-memory BFGS rules, QNing is particularly effective to solve high-dimensional optimization problems, while enjoying a worst-case linear convergence rate for strongly convex problems  ...  The proposed scheme, called QNing can be notably applied to incremental first-order methods such as the stochastic variance-reduced gradient descent algorithm (SVRG) and other randomized incremental optimization  ...  This work was supported by the ERC grant SOLARIS (number 714381), a grant from ANR (MACARON project ANR-14-CE23-0003-01), and the program "Learning in Machines and Brains" (CIFAR).  ... 
arXiv:1610.00960v4 fatcat:flhlxb6pa5athlm6enwswmy6zq

Deep Reinforcement Learning via L-BFGS Optimization [article]

Jacob Rafati, Roummel F. Marcia
2019 arXiv   pre-print
Methods for solving the optimization problems in deep RL are restricted to the class of first-order algorithms, such as stochastic gradient descent (SGD).  ...  The limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) approach is one of the most popular quasi-Newton methods that construct positive definite Hessian approximations.  ...  We used stochastic line search L-BFGS method as the optimization method (Algorithm 2).  ... 
arXiv:1811.02693v2 fatcat:dgqrwcko5vbmddysmhch5fg244

Asynchronous Parallel Stochastic Quasi-Newton Methods [article]

Qianqian Tong, Guannan Liang, Xingyu Cai, Chunjiang Zhu, Jinbo Bi
2020 arXiv   pre-print
Adopting the variance reduction technique, a prior stochastic L-BFGS, which has not been designed for parallel computing, reaches a linear convergence rate.  ...  Unlike prior attempts, which parallelize only the calculation for gradient or the two-loop recursion of L-BFGS, our algorithm is the first one that truly parallelizes L-BFGS with a convergence guarantee  ...  L-BFGS) enjoy a superlinear convergence rate, while the stochastic version of quasi-Newton methods (including L-BFGS) will have a sublinear convergence rate in strongly convex optimization as a sacrifice  ... 
arXiv:2011.00667v1 fatcat:w2so7l3imna73aoyhwpn52qfeq

Practical Quasi-Newton Methods for Training Deep Neural Networks [article]

Donald Goldfarb, Yi Ren, Achraf Bahamou
2021 arXiv   pre-print
We consider the development of practical stochastic quasi-Newton, and in particular Kronecker-factored block-diagonal BFGS and L-BFGS methods, for training deep neural networks (DNNs).  ...  Consequently, computing and storing a full n × n BFGS approximation or storing a modest number of (step, change in gradient) vector pairs for use in an L-BFGS implementation is out of the question.  ...  In this section, we prove the convergence of Algorithm 5, a variant of K-BFGS(L). Algorithm 5 is very similar to our actual implementation of K-BFGS(L) (i.e.  ... 
arXiv:2006.08877v3 fatcat:oow4oj6kcvf6bpcvlaakluzlwm

Stochastic Second-Order Optimization via von Neumann Series [article]

Mojmir Mutny
2017 arXiv   pre-print
A stochastic iterative algorithm approximating second-order information using von Neumann series is discussed. We present convergence guarantees for strongly-convex and smooth functions.  ...  In numerical experiments, the behavior of the error is similar to the second-order algorithm L-BFGS, and improves the performance of LISSA for quadratic objective function.  ...  Figure 2 : 2 A numerical experiment comparing ISSA, LISSA, BFGS and L-BFGS algorithms. In a) the τ = 5 for ISSA and L-BFGS used last 5 gradient information as well.  ... 
arXiv:1612.04694v4 fatcat:5mr4n6lhy5crnojym2isn5z6li

Batch-Expansion Training: An Efficient Optimization Framework [article]

Michał Dereziński and Dhruv Mahajan and S. Sathiya Keerthi and S. V. N. Vishwanathan and Markus Weimer
2018 arXiv   pre-print
We propose Batch-Expansion Training (BET), a framework for running a batch optimizer on a gradually expanding dataset.  ...  As opposed to stochastic approaches, batches do not need to be resampled i.i.d. at every iteration, thus making BET more resource efficient in a distributed setting, and when disk-access is constrained  ...  Furthermore, the time complexity of performing a single iteration for many of those algorithms (including GD and L-BFGS) is linearly proportional to the data size.  ... 
arXiv:1704.06731v3 fatcat:zxjm4ezou5aptliuexs6bfcdc4

On the Acceleration of L-BFGS with Second-Order Information and Stochastic Batches [article]

Jie Liu, Yu Rong, Martin Takac, Junzhou Huang
2018 arXiv   pre-print
This paper proposes a framework of L-BFGS based on the (approximate) second-order information with stochastic batches, as a novel approach to the finite-sum minimization problems.  ...  Different from the classical L-BFGS where stochastic batches lead to instability, we use a smooth estimate for the evaluations of the gradient differences while achieving acceleration by well-scaling the  ...  Convergence Analysis In this section, we study the convergence of our stochastic L-BFGS framework.  ... 
arXiv:1807.05328v1 fatcat:laxjqpcvr5aarormumbbicqrdm

Variance-Reduced Stochastic Quasi-Newton Methods for Decentralized Learning: Part II [article]

Jiaojiao Zhang, Huikang Liu, Anthony Man-Cho So, Qing Ling
2022 arXiv   pre-print
In Part I of this work, we have proposed a general framework of decentralized stochastic quasi-Newton methods, which converge linearly to the optimal solution under the assumption that the local Hessian  ...  Numerical experiments demonstrate that the proposed quasi-Newton methods are much faster than the existing decentralized stochastic first-order algorithms.  ...  An online limited-memory BFGS using stochastic gradients is proposed in [6] , in lieu of the full gradient in BFGS update; the convergence analysis is given in [7] .  ... 
arXiv:2201.07733v1 fatcat:fvn7b3mcsngavhwoah3ayipqce
« Previous Showing results 1 — 15 out of 1,652 results