5,343 Hits in 3.5 sec

Accelerating Stochastic Gradient Descent For Least Squares Regression [article]

Prateek Jain, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli and Aaron Sidford
2018 arXiv   pre-print
This work considers these issues for the special case of stochastic approximation for the least squares regression problem, and our main result refutes the conventional wisdom by showing that acceleration  ...  Critical to the analysis is a sharp characterization of accelerated stochastic gradient descent as a stochastic process.  ...  Acknowledgments: Sham Kakade acknowledges funding from Washington Research Foundation Fund for Innovation in Data-Intensive Discovery and the NSF through awards CCF-1637360, CCF-1703574 and CCF-1740551  ... 
arXiv:1704.08227v2 fatcat:f4hwqjigjncw7iav2e6hd2pz7a

Accelerated Stochastic Block Coordinate Descent with Optimal Sampling

Aston Zhang, Quanquan Gu
2016 Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD '16  
We propose an accelerated stochastic block coordinate descent (ASBCD) algorithm, which incorporates the incrementally averaged partial derivative into the stochastic partial derivative and exploits optimal  ...  Specifically, we consider the case where the non-differentiable function is block separable and admits a simple proximal mapping for each block.  ...  We would like to thank the anonymous reviewers for their helpful comments.  ... 
doi:10.1145/2939672.2939819 dblp:conf/kdd/ZhangG16 fatcat:sl3vfm4lsndsnhelknh54v6vwa


2018 Journal of Engineering Science and Technology  
Nowadays, many variants of gradient descent (i.e., the methods included in machine learning for regression) have been proposed.  ...  Average Gradient Descent (SAGD), Momentum Gradient Descent (MGD), Accelerated Gradient Descent (AGD), Adagrad, Adadelta, RMSprop and Adam.  ...  Journal of Engineering Science and Technology August 2018, Vol. 13(8) A.B.D.N acknowledged RISTEK DIKTI for grant-in-aid in Penelitian Terapan Unggulan Perguruan Tinggi Negeri (PTUPT) and Penelitian  ... 
doaj:e860373b2c534dafa89c83f0b0e57c0d fatcat:tieoku53fjbhla5he5uwhk4xja

Mixing of Stochastic Accelerated Gradient Descent [article]

Peiyuan Zhang, Hadi Daneshmand, Thomas Hofmann
2019 arXiv   pre-print
We study the mixing properties for stochastic accelerated gradient descent (SAGD) on least-squares regression.  ...  First, we show that stochastic gradient descent (SGD) and SAGD are simulating the same invariant distribution.  ...  Abstract We study the mixing properties for stochastic accelerated gradient descent (SAGD) on least-squares regression.  ... 
arXiv:1910.14616v1 fatcat:yzd574a3hzdgljzyqbmayusc3a

Accelerated gradient boosting

G. Biau, B. Cadre, L. Rouvière
2019 Machine Learning  
We combine gradient boosting and Nesterov's accelerated descent to design a new algorithm, which we call AGB (for Accelerated Gradient Boosting).  ...  Gradient tree boosting is a prediction algorithm that sequentially produces a model in the form of linear combinations of decision trees, by solving an infinite-dimensional optimization problem.  ...  Acknowledgements We greatly thank two referees for valuable comments and insightful suggestions, which led to a substantial improvement of the paper.  ... 
doi:10.1007/s10994-019-05787-1 fatcat:r2orwyanzraatengpdp6hvpigy

The Implicit Regularization of Stochastic Gradient Flow for Least Squares [article]

Alnur Ali, Edgar Dobriban, Ryan J. Tibshirani
2020 arXiv   pre-print
We study the implicit regularization of mini-batch stochastic gradient descent, when applied to the fundamental problem of least squares regression.  ...  We leverage a continuous-time stochastic differential equation having the same moments as stochastic gradient descent, which we call stochastic gradient flow.  ...  Preliminaries Least Squares, Stochastic Gradient Descent, and Stochastic Gradient Flow Consider the usual least squares regression problem, minimize β∈R p 1 2n y − Xβ 2 2 , (1) where y ∈ R n is the response  ... 
arXiv:2003.07802v2 fatcat:req5g6wjlncsbexknakh7kbyii

Stochastic Gradient Descent with Exponential Convergence Rates of Expected Classification Errors [article]

Atsushi Nitanda, Taiji Suzuki
2021 arXiv   pre-print
In recent research, an exponential convergence rate for stochastic gradient descent was shown under a strong low-noise condition but provided theoretical analysis was limited to the squared loss function  ...  We consider stochastic gradient descent and its averaging variant for binary classification problems in a reproducing kernel Hilbert space.  ...  More recently, Pillaud-Vivien et al. (2017) exhibited exponential convergence of stochastic gradient descent for solving regularized least-squares regression for classification problems by using this condition  ... 
arXiv:1806.05438v3 fatcat:jqzzylk4uzendhucpleiao64gi

Harder, Better, Faster, Stronger Convergence Rates for Least-Squares Regression [article]

Aymeric Dieuleveut, Nicolas Flammarion (LIENS, SIERRA), Francis Bach
2016 arXiv   pre-print
We present the first algorithm that achieves jointly the optimal prediction error rates for least-squares regression, both in terms of forgetting of initial conditions in O(1/n 2), and in terms of dependence  ...  Our new algorithm is based on averaged accelerated regularized gradient descent, and may also be analyzed through finer assumptions on initial conditions and the Hessian matrix, leading to dimension-free  ...  Acknowledgements The authors would like to thank Damien Garreau for helpful discussions.  ... 
arXiv:1602.05419v2 fatcat:csuyfykk5nfnnoxhfrchftzr24

Gradient Projection Iterative Sketch for Large-Scale Constrained Least-Squares [article]

Junqi Tang, Mohammad Golbabaee, Mike Davies
2017 arXiv   pre-print
We propose a randomized first order optimization algorithm Gradient Projection Iterative Sketch (GPIS) and an accelerated variant for efficiently solving large scale constrained Least Squares (LS).  ...  We provide theoretical convergence analysis for both proposed algorithms and demonstrate our methods' computational efficiency compared to classical accelerated gradient method, and the state of the art  ...  The authors also give thanks to the anonymous reviewers for insightful comments.  ... 
arXiv:1609.09419v4 fatcat:yxhmaawwurfg3mpntz5jhu76be

The Practicality of Stochastic Optimization in Imaging Inverse Problems [article]

Junqi Tang, Karen Egiazarian, Mohammad Golbabaee, Mike Davies
2019 arXiv   pre-print
In this work we investigate the practicality of stochastic gradient descent and recently introduced variants with variance-reduction techniques in imaging inverse problems.  ...  Using standard tools in numerical linear algebra, we derive conditions on the spectral structure of the inverse problem for being a suitable application of stochastic gradient methods.  ...  descent and minibatch SGD on constrained least-squares.  ... 
arXiv:1910.10100v2 fatcat:sh2elim45vfx7bh6xizuzi6e3u

Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization

Shai Shalev-Shwartz, Tong Zhang
2014 Mathematical programming  
We introduce a proximal version of the stochastic dual coordinate ascent method and show how to accelerate the method using an inner-outer iteration procedure.  ...  We analyze the runtime of the framework and obtain rates that improve stateof-the-art results for various key machine learning optimization problems including SVM, logistic regression, ridge regression  ...  Acknowledgements The authors would like to thank Fen Xia for careful proofreading of the paper which helped us correct numerous typos.  ... 
doi:10.1007/s10107-014-0839-0 fatcat:smrkyo5paray7h74iv7qyklqke

Jensen: An Easily-Extensible C++ Toolkit for Production-Level Machine Learning and Convex Optimization [article]

Rishabh Iyer, John T. Halloran, Kai Wei
2018 arXiv   pre-print
machine learning classifiers and regressors (Logistic Regression, SVMs, Least Square Regression, etc.).  ...  Jensen implements a framework of convex (or loss) functions, convex optimization algorithms (including Gradient Descent, L-BFGS, Stochastic Gradient Descent, Conjugate Gradient, etc.), and a family of  ...  SVCDual Dual coordinate descent algorithm for SVMs . sgd Stochastic gradient descent (SGD) with static step size. sgdDecayingLearningRate SGD with a specified rate of decay. sgdAdagrad SGD with AdaGrad  ... 
arXiv:1807.06574v1 fatcat:q3y3cacac5ayvneimkzupwm5h4

Poor starting points in machine learning [article]

Mark Tygert
2016 arXiv   pre-print
In many settings, the method of Robbins and Monro (online stochastic gradient descent) is known to be optimal for good starting points, but may not be optimal for poor starting points -- indeed, for poor  ...  starting points Nesterov acceleration can help during the initial iterations, even though Nesterov methods not designed for stochastic approximation could hurt during later iterations.  ...  Introduction The scheme of Robbins and Monro (online stochastic gradient descent) has long been known to be optimal for stochastic approximation/optimization . . . provided that the starting point for  ... 
arXiv:1602.02823v1 fatcat:7qetlxgfx5e6vc2ubseh5s7e3a

Improved SVRG for quadratic functions [article]

Nabil Kahale
2021 arXiv   pre-print
The algorithm is a variant of the stochastic variance reduced gradient (SVRG).  ...  In several applications, including least-squares regressions, ridge regressions, linear discriminant analysis and regularized linear discriminant analysis, the running time of each iteration is proportional  ...  Dieuleveut, Flammarion and Bach (2017) study an averaged accelerated regularized SGD algorithm for least-squares regressions.  ... 
arXiv:2006.01017v2 fatcat:hjcw5qud5zdt5oyp2r3yqcjsnq

Optimization in learning and data analysis

Stephen J. Wright
2013 Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '13  
Accelerated Gradient (and its cousins) Stochastic Gradient Coordinate Descent Shrinking techniques for regularized formulations Higher-order methods Augmented Lagrangians, Splitting, ADMM.  ...  Steepest descent, accelerated gradient, stochastic gradient, higher-order can all be extended to regularized case by replacing the line step with a shrink operation.  ...  (Thus may be good for special problems, at extreme scale.) Multicore implementation is easy, when asynchronous solver is used on the QP subproblems.  ... 
doi:10.1145/2487575.2492149 dblp:conf/kdd/Wright13 fatcat:j6edo6tj65fhjil5sg2hqkpvly
« Previous Showing results 1 — 15 out of 5,343 results