173 Hits in 6.4 sec

Preconditioned Gradient Descent for Over-Parameterized Nonconvex Matrix Factorization

Jialun Zhang, Salar Fattahi, Richard Y. Zhang
2021 Neural Information Processing Systems  
We propose an inexpensive preconditioner for the matrix sensing variant of nonconvex matrix factorization that restores the convergence rate of gradient descent back to linear, even in the over-parameterized  ...  Classical gradient descent in a neighborhood of the solution slows down due to the need for the model matrix factor to become singular.  ...  Conclusions In this paper, we propose a preconditioned gradient descent or PrecGD for nonconvex matrix factorization with a comparable per-iteration cost to classical gradient descent.  ... 
dblp:conf/nips/ZhangFZ21 fatcat:ns2f6x4eabexfh6caul4m2qeha

Blended Cured Quasi-Newton for Geometric Optimization [article]

Yufeng Zhu, Robert Bridson, Danny M. Kaufman
2018 arXiv   pre-print
model that adaptively blends the Sobolev (inverse-Laplacian-processed) gradient and L-BFGS descent to gain the advantages of both, while avoiding L-BFGS's current limitations in geometry optimization  ...  Together these improvements form the basis for Blended Cured Quasi-Newton (BCQN), a new geometry optimization algorithm.  ...  First-order methods build descent steps by preconditioning the gradient with a fixed proxy matrix.  ... 
arXiv:1705.00039v2 fatcat:ink3wsynyfc6xb4wwav5gkegwa

Scaling and Scalability: Provable Nonconvex Low-Rank Tensor Completion

Tian Tong, Cong Ma, Ashley Prater-Bennette, Erin Tripp, Yuejie Chi
2022 International Conference on Artificial Intelligence and Statistics  
Harnessing the low-rank structure of tensors in the Tucker decomposition, this paper develops a scaled gradient descent (ScaledGD) algorithm to directly recover the tensor factors with tailored spectral  ...  Our algorithm highlights the power of appropriate preconditioning in accelerating nonconvex statistical estimation, where the iteration-varying preconditioners promote desirable invariance properties of  ...  DISCUSSIONS This paper develops a scaled gradient descent algorithm over the factor space for low-rank tensor completion with provable sample and computational guarantees, leading to a highly scalable  ... 
dblp:conf/aistats/TongMPTC22 fatcat:qvuuh67pt5bhvjlzayl74rwlhe

Preconditioned Gradient Descent for Overparameterized Nonconvex Burer–Monteiro Factorization with Global Optimality Certification [article]

Gavin Zhang, Salar Fattahi, Richard Y. Zhang
2022 arXiv   pre-print
We consider using gradient descent to minimize the nonconvex function f(X)=ϕ(XX^T) over an n× r factor matrix X, in which ϕ is an underlying smooth convex cost function defined over n× n matrices.  ...  Unfortunately, overparameterization significantly slows down the convergence of gradient descent, from a linear rate with r=r^⋆ to a sublinear rate when r>r^⋆, even when ϕ is strongly convex.  ...  The Burer-Monteiro approach eliminates the large n×n positive semidefinite matrix by reformulating the problem as minimizing the nonconvex function f (X) = φ(XX T ) over an n × r factor matrix X.  ... 
arXiv:2206.03345v1 fatcat:tey4zozjdrfvpg2wi5hbzqg3rm

Equilibrated adaptive learning rates for non-convex optimization [article]

Yann N. Dauphin, Harm de Vries, Yoshua Bengio
2015 arXiv   pre-print
Our experiments show that ESGD performs as well or better than RMSProp in terms of convergence speed, always clearly improving over plain stochastic gradient descent.  ...  For the better conditioned function (b) these oscillations are reduced, and gradient descent makes faster progress.  ...  In Sluis (1969) it is shown that the condition number of a row equilibrated matrix is at most a factor √ N worse than the diagonal preconditioning matrix that optimally reduces the condition number.  ... 
arXiv:1502.04390v2 fatcat:fyicqskqxrbxzivryvpcljtypm

Accelerating Certifiable Estimation with Preconditioned Eigensolvers [article]

David M. Rosen
2022 arXiv   pre-print
entails computing a minimum eigenpair of a certain symmetric certificate matrix.  ...  gradient (LOBPCG) method together with a simple yet highly effective algebraic preconditioner.  ...  Second, LOBPCG employs subspace acceleration: instead of searching solely over the 1-dimensional subspace spanned by w (as in basic preconditioned gradient descent), at each iteration it calculates the  ... 
arXiv:2207.05257v1 fatcat:exjz3ixjevgmrapvwv2ivctqsq

Understanding End-to-End Model-Based Reinforcement Learning Methods as Implicit Parameterization

Clement Gehring, Kenji Kawaguchi, Jiaoyang Huang, Leslie Pack Kaelbling
2021 Neural Information Processing Systems  
We prove that, for a linear parametrization, gradient descent converges to global optima despite nonlinearity and non-convexity introduced by the implicit representation.  ...  Furthermore, we derive convergence rates for both cases which allow us to identify conditions under which stochastic gradient descent (SGD) with this implicit representation converges substantially faster  ...  Acknowledgements We gratefully acknowledge support from NSF grant 1723381; from AFOSR grant FA9550-17-1-0165; from ONR grant N00014-18-1-2847; from MIT-IBM Watson Lab; from the MIT Quest for Intelligence  ... 
dblp:conf/nips/GehringKHK21 fatcat:g3v7udbqlfas5g62yzk27z5ou4

Fast Convergence of Natural Gradient Descent for Over-Parameterized Neural Networks

Guodong Zhang, James Martens, Roger B. Grosse
2019 Neural Information Processing Systems  
In this work, we analyze for the first time the speed of convergence of natural gradient descent on nonlinear neural networks with squared-error loss.  ...  row rank, and (2) the Jacobian matrix is stable for small perturbations around the initialization.  ...  HaoChen, Shengyang Sun and Mufan Li for helpful discussion. RG acknowledges support from the CIFAR Canadian AI Chairs program and the Ontario MRIS Early Researcher Award.  ... 
dblp:conf/nips/ZhangMG19 fatcat:2zjruytlpvdzfefl7p6wsrk7la

Black-Box Variational Inference as a Parametric Approximation to Langevin Dynamics

Matthew D. Hoffman, Yian Ma
2020 International Conference on Machine Learning  
In particular, a close examination of the Fokker-Planck equation that governs the Langevin dynamics (LD) MCMC procedure reveals that LD implicitly follows a gradient flow that corresponds to a variational  ...  suggests that the transient bias of LD (due to the Markov chain not having burned in) may track that of VI (due to the optimizer not having converged), up to differences due to VI's asymptotic bias and parameterization  ...  Saurous, Pavel Sountsov, Chris Suter, Srinivas Vasudevan, and Sharad Vikram for helpful and enjoyable discussions and feedback, as well as the anonymous reviewers for their helpful suggestions.  ... 
dblp:conf/icml/HoffmanM20 fatcat:rmyydygdrzckhnwxjdm3io2ycy

An efficient preconditioner for stochastic gradient descent optimization of image registration

Yuchuan Qiao, Boudewijn P.F. Lelieveldt, Marius Staring
2019 IEEE Transactions on Medical Imaging  
Stochastic gradient descent (SGD) is commonly used to solve (parametric) image registration problems.  ...  Based on the observed distribution of voxel displacements in the registration, we estimate the diagonal entries of a preconditioning matrix, thus rescaling the optimization cost function.  ...  Stolk are acknowledged for providing a ground truth for the SPREAD study data used in this paper.  ... 
doi:10.1109/tmi.2019.2897943 pmid:30762536 fatcat:fotwr4k66bcdxis2hlntmu4yra

A Validation Approach to Over-parameterized Matrix and Image Recovery [article]

Lijun Ding, Zhen Qin, Liwei Jiang, Jinxin Zhou, Zhihui Zhu
2022 arXiv   pre-print
We then solve the associated nonconvex problem using gradient descent with small random initialization.  ...  Moreover, experiments show that the proposed validation approach can also be efficiently used for image restoration with deep image prior which over-parameterizes an image with a deep network.  ...  Thus, we can apply the same gradient descent to solve the over-parameterized problem to recovery the ground-truth matrix X .  ... 
arXiv:2209.10675v1 fatcat:dopwdry3jncgpa45ax2gsmjy2e

A Variational Analysis of Stochastic Gradient Algorithms [article]

Stephan Mandt, Matthew D. Hoffman, David M. Blei
2016 arXiv   pre-print
Stochastic Gradient Descent (SGD) is an important algorithm in machine learning.  ...  We show that SGD with constant rates can be effectively used as an approximate posterior inference algorithm for probabilistic modeling.  ...  The stationary distribution is parameterized by the learning rate, minibatch size, and preconditioning matrix.  ... 
arXiv:1602.02666v1 fatcat:rsx7wibckbfttcwg3udauobhqe

On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization [article]

Sanjeev Arora, Nadav Cohen, Elad Hazan
2018 arXiv   pre-print
Even on simple convex problems such as linear regression with ℓ_p loss, p>2, gradient descent can benefit from transitioning to a non-convex overparameterized objective, more than it would from some common  ...  We also prove that it is mathematically impossible to obtain the acceleration effect of overparametrization via gradients of any regularizer.  ...  What happens to gradient descent on this nonconvex objective? Observation 1.  ... 
arXiv:1802.06509v2 fatcat:nreu5rshpngwpnmofdjsbvoyiq

Convergence Theory of Learning Over-parameterized ResNet: A Full Characterization [article]

Huishuai Zhang, Da Yu, Mingyang Yi, Wei Chen, Tie-Yan Liu
2019 arXiv   pre-print
In this paper, we fully characterize the convergence theory of gradient descent for learning over-parameterized ResNet with different values of τ.  ...  Recent work established the convergence of learning over-parameterized ResNet with a scaling factor τ=1/L on the residual branch where L is the network depth.  ...  Yingbin Liang for many helpful discussions and thank Zeyuan Allen-Zhu for clarifying the proof in Allen-Zhu et al. [2018b] and discussion.  ... 
arXiv:1903.07120v4 fatcat:46qyjln7jza6zggvo2lem3dwcq

Algorithmic Regularization in Model-free Overparametrized Asymmetric Matrix Factorization [article]

Liwei Jiang, Yudong Chen, Lijun Ding
2022 arXiv   pre-print
We study the asymmetric matrix factorization problem under a natural nonconvex formulation with arbitrary overparametrization.  ...  Our theoretical results provide accurate prediction for the behavior gradient descent, showing good agreement with numerical experiments.  ...  Preconditioned gradient descent for over-parameterized nonconvex matrix factorization. Advances in Neural Information Processing Systems, 34, 2021. [Zha21] Richard Y Zhang.  ... 
arXiv:2203.02839v2 fatcat:d2xua7c5ejecxoshfdgwguxywy
« Previous Showing results 1 — 15 out of 173 results