A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
A Linearly-Convergent Stochastic L-BFGS Algorithm
[article]
2016
arXiv
pre-print
We propose a new stochastic L-BFGS algorithm and prove a linear convergence rate for strongly convex and smooth functions. ...
Our algorithm draws heavily from a recent stochastic variant of L-BFGS proposed in Byrd et al. (2014) as well as a recent approach to variance reduction for stochastic gradient descent from Johnson and ...
Discussion This paper introduces a stochastic version of L-BFGS and proves a linear rate of convergence in the strongly convex case. ...
arXiv:1508.02087v2
fatcat:qffeae2oufgzzhxcmfa6jgknze
Quasi-Newton Methods: Superlinear Convergence Without Line Searches for Self-Concordant Functions
[article]
2018
arXiv
pre-print
We show that using this step size in the BFGS method (and quasi-Newton methods in the Broyden convex class other than the DFP method) results in superlinear convergence for strongly convex self-concordant ...
of stochastic gradient descent on stochastic optimization problems. ...
Then the adaptive L-BFGS method is globally R-linearly convergent. ...
arXiv:1612.06965v3
fatcat:5dt4s3uemvatpl7enjv43rdcsm
Quasi-Newton methods: superlinear convergence without line searches for self-concordant functions
2018
Optimization Methods and Software
We show that using this step size in the BFGS method (and quasi-Newton methods in the Broyden convex class other than the DFP method) results in superlinear convergence for strongly convex self-concordant ...
of stochastic gradient descent on stochastic optimization problems. ...
Then the adaptive L-BFGS method is globally R-linearly convergent. ...
doi:10.1080/10556788.2018.1510927
fatcat:xtmmqflsq5grplwt3guuysd6ty
Stochastic Damped L-BFGS with Controlled Norm of the Hessian Approximation
[article]
2020
arXiv
pre-print
Our algorithm, VARCHEN, draws from previous work that proposed a novel stochastic damped L-BFGS algorithm called SdLBFGS. ...
We propose a new stochastic variance-reduced damped L-BFGS algorithm, where we leverage estimates of bounds on the largest and smallest eigenvalues of the Hessian approximation to balance its quality and ...
[33] proposed a stochastic damped L-BFGS (SdLBFGS) algorithm and proved almost sure convergence to a stationary point. ...
arXiv:2012.05783v1
fatcat:nwjtqfjjnbeqjcejbldgaul56i
Stochastic L-BFGS: Improved Convergence Rates and Practical Acceleration Strategies
2018
IEEE Transactions on Signal Processing
We revisit the stochastic limited-memory BFGS (L-BFGS) algorithm. ...
By proposing a new framework for the convergence analysis, we prove improved convergence rates and computational complexities of the stochastic L-BFGS algorithms compared to previous works. ...
In the literature (Gower et al., 2016; Moritz et al., 2016) , usually option I or II (in Algorithm 1) is analyzed theoretically to prove that the stochastic L-BFGS algorithms therein converge linearly ...
doi:10.1109/tsp.2017.2784360
fatcat:u7jv3vxpave7jkbxnmpm7gcp54
Fast and Furious Convergence: Stochastic Second Order Methods under Interpolation
[article]
2020
arXiv
pre-print
We empirically evaluate stochastic L-BFGS and a "Hessian-free" implementation of R-SSN for binary classification on synthetic, linearly-separable datasets and real datasets under a kernel mapping. ...
Furthermore, we analyze stochastic BFGS algorithms in the interpolation setting and prove their global linear convergence. ...
A linearly-convergent stochastic L-BFGS algorithm.
In Artificial Intelligence and Statistics, pages 249-
258, 2016.
Yurii Nesterov. Lectures on convex optimization, vol-
ume 137. Springer, 2018. ...
arXiv:1910.04920v2
fatcat:jfzvxawxdrcp3ocfbnxmzh4fi4
Stochastic Block BFGS: Squeezing More Curvature out of Data
[article]
2016
arXiv
pre-print
We propose a novel limited-memory stochastic block BFGS update for incorporating enriched curvature information in stochastic approximation methods. ...
We propose several sketching strategies, present a new quasi-Newton method that uses stochastic block BFGS updates combined with the variance reduction approach SVRG to compute batch stochastic gradients ...
Convergence In this section we prove that Algorithm 1 converges linearly. ...
arXiv:1603.09649v1
fatcat:7tduh5ikgnchzmcz4n6mvxgo4a
An Inexact Variable Metric Proximal Point Algorithm for Generic Quasi-Newton Acceleration
[article]
2019
arXiv
pre-print
When combined with limited-memory BFGS rules, QNing is particularly effective to solve high-dimensional optimization problems, while enjoying a worst-case linear convergence rate for strongly convex problems ...
The proposed scheme, called QNing can be notably applied to incremental first-order methods such as the stochastic variance-reduced gradient descent algorithm (SVRG) and other randomized incremental optimization ...
This work was supported by the ERC grant SOLARIS (number 714381), a grant from ANR (MACARON project ANR-14-CE23-0003-01), and the program "Learning in Machines and Brains" (CIFAR). ...
arXiv:1610.00960v4
fatcat:flhlxb6pa5athlm6enwswmy6zq
Deep Reinforcement Learning via L-BFGS Optimization
[article]
2019
arXiv
pre-print
Methods for solving the optimization problems in deep RL are restricted to the class of first-order algorithms, such as stochastic gradient descent (SGD). ...
The limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) approach is one of the most popular quasi-Newton methods that construct positive definite Hessian approximations. ...
We used stochastic line search L-BFGS method as the optimization method (Algorithm 2). ...
arXiv:1811.02693v2
fatcat:dgqrwcko5vbmddysmhch5fg244
Asynchronous Parallel Stochastic Quasi-Newton Methods
[article]
2020
arXiv
pre-print
Adopting the variance reduction technique, a prior stochastic L-BFGS, which has not been designed for parallel computing, reaches a linear convergence rate. ...
Unlike prior attempts, which parallelize only the calculation for gradient or the two-loop recursion of L-BFGS, our algorithm is the first one that truly parallelizes L-BFGS with a convergence guarantee ...
L-BFGS) enjoy a superlinear convergence rate, while the stochastic version of quasi-Newton methods (including L-BFGS) will have a sublinear convergence rate in strongly convex optimization as a sacrifice ...
arXiv:2011.00667v1
fatcat:w2so7l3imna73aoyhwpn52qfeq
Practical Quasi-Newton Methods for Training Deep Neural Networks
[article]
2021
arXiv
pre-print
We consider the development of practical stochastic quasi-Newton, and in particular Kronecker-factored block-diagonal BFGS and L-BFGS methods, for training deep neural networks (DNNs). ...
Consequently, computing and storing a full n × n BFGS approximation or storing a modest number of (step, change in gradient) vector pairs for use in an L-BFGS implementation is out of the question. ...
In this section, we prove the convergence of Algorithm 5, a variant of K-BFGS(L). Algorithm 5 is very similar to our actual implementation of K-BFGS(L) (i.e. ...
arXiv:2006.08877v3
fatcat:oow4oj6kcvf6bpcvlaakluzlwm
Stochastic Second-Order Optimization via von Neumann Series
[article]
2017
arXiv
pre-print
A stochastic iterative algorithm approximating second-order information using von Neumann series is discussed. We present convergence guarantees for strongly-convex and smooth functions. ...
In numerical experiments, the behavior of the error is similar to the second-order algorithm L-BFGS, and improves the performance of LISSA for quadratic objective function. ...
Figure 2 : 2 A numerical experiment comparing ISSA, LISSA, BFGS and L-BFGS algorithms. In a) the τ = 5 for ISSA and L-BFGS used last 5 gradient information as well. ...
arXiv:1612.04694v4
fatcat:5mr4n6lhy5crnojym2isn5z6li
Batch-Expansion Training: An Efficient Optimization Framework
[article]
2018
arXiv
pre-print
We propose Batch-Expansion Training (BET), a framework for running a batch optimizer on a gradually expanding dataset. ...
As opposed to stochastic approaches, batches do not need to be resampled i.i.d. at every iteration, thus making BET more resource efficient in a distributed setting, and when disk-access is constrained ...
Furthermore, the time complexity of performing a single iteration for many of those algorithms (including GD and L-BFGS) is linearly proportional to the data size. ...
arXiv:1704.06731v3
fatcat:zxjm4ezou5aptliuexs6bfcdc4
On the Acceleration of L-BFGS with Second-Order Information and Stochastic Batches
[article]
2018
arXiv
pre-print
This paper proposes a framework of L-BFGS based on the (approximate) second-order information with stochastic batches, as a novel approach to the finite-sum minimization problems. ...
Different from the classical L-BFGS where stochastic batches lead to instability, we use a smooth estimate for the evaluations of the gradient differences while achieving acceleration by well-scaling the ...
Convergence Analysis In this section, we study the convergence of our stochastic L-BFGS framework. ...
arXiv:1807.05328v1
fatcat:laxjqpcvr5aarormumbbicqrdm
Variance-Reduced Stochastic Quasi-Newton Methods for Decentralized Learning: Part II
[article]
2022
arXiv
pre-print
In Part I of this work, we have proposed a general framework of decentralized stochastic quasi-Newton methods, which converge linearly to the optimal solution under the assumption that the local Hessian ...
Numerical experiments demonstrate that the proposed quasi-Newton methods are much faster than the existing decentralized stochastic first-order algorithms. ...
An online limited-memory BFGS using stochastic gradients is proposed in [6] , in lieu of the full gradient in BFGS update; the convergence analysis is given in [7] . ...
arXiv:2201.07733v1
fatcat:fvn7b3mcsngavhwoah3ayipqce
« Previous
Showing results 1 — 15 out of 1,652 results