35,897 Hits in 6.0 sec

Linear Convergence of Adaptive Stochastic Gradient Descent [article]

Yuege Xie, Xiaoxia Wu, Rachel Ward
2020 arXiv   pre-print
We prove that the norm version of the adaptive stochastic gradient method (AdaGrad-Norm) achieves a linear convergence rate for a subset of either strongly convex functions or non-convex functions that  ...  On top of RUIG, we develop a two-stage framework to prove the linear convergence of AdaGrad-Norm without knowing the parameters of the objective functions.  ...  The paper focuses on the robustness of the linear convergence of adaptive stochastic gradient descent to unknown hyperparameters. Adaptive gradient descent methods introduced in Duchi et al.  ... 
arXiv:1908.10525v2 fatcat:a4ds5rawxjfjjazwvqss3rejsu

Using Curvature Information for Fast Stochastic Search

Genevieve B. Orr, Todd K. Leen
1996 Neural Information Processing Systems  
We present an algorithm for fast stochastic gradient descent that uses a nonlinear adaptive momentum scheme to optimize the late time convergence rate.  ...  The algorithm makes effective use of curvature information, requires only O(n) storage and computation, and delivers convergence rates close to the theoretical optimum.  ...  Implementations of stochastic learning typically use a constant learning rate during the early part of training (what Darken and Moody Momentum in Stochastic Gradient Descent The adaptive momentum algorithm  ... 
dblp:conf/nips/OrrL96 fatcat:uiwsuxbydnhgjffwdtz37icxru

Stochastic Learning [chapter]

Léon Bottou
2004 Lecture Notes in Computer Science  
This contribution presents an overview of the theoretical and practical aspects of the broad family of learning algorithms based on Stochastic Gradient Descent, including Perceptrons, Adalines, K-Means  ...  Stochastic gradient descent benefit from the redundancies of the training set.  ...  The convergence proofs for both the discrete (3) and stochastic (4) gradient descents follow the same three steps.  ... 
doi:10.1007/978-3-540-28650-9_7 fatcat:a7fava6fibcwdi5dd4uh6u2qym

Adaptive minimum-BER linear multiuser detection for DS-CDMA signals in multipath channels

Sheng Chen, A.K. Samingan, B. Mulgrew, L. Hanzo
2001 IEEE Transactions on Signal Processing  
Index Terms-Adaptive algorithms, linear multiuser detectors, minimum bit error rate, minimum mean square error, stochastic gradient algorithms.  ...  Based on the approach of kernel density estimation for approximating the bit error rate (BER) from training data, a least mean squares (LMS) style stochastic gradient adaptive algorithm is developed for  ...  There are two stochastic gradient adaptive algorithms for realizing the MBER linear multiuser detector in the literature [8] , [9] .  ... 
doi:10.1109/78.923306 fatcat:wb3citpblfcb3fbgwyirci64oq

Linear Convergence of Generalized Mirror Descent with Time-Dependent Mirrors [article]

Adityanarayanan Radhakrishnan and Mikhail Belkin and Caroline Uhler
2021 arXiv   pre-print
While several recent works use a PL-based analysis to establish linear convergence of stochastic gradient descent methods, the question remains as to whether a similar analysis can be conducted for more  ...  The Polyak-Lojasiewicz (PL) inequality is a sufficient condition for establishing linear convergence of gradient descent, even in non-convex settings.  ...  Linear Convergence of Adaptive Stochastic Gradient Descent. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2020. [23] Bing Xu, Naiyan Wang, Tianqi Chen, and Mu Li.  ... 
arXiv:2009.08574v2 fatcat:ei5lvpwr4ne6rjzktw4p5etilm

Automated Inference with Adaptive Batches

Soham De, Abhay Kumar Yadav, David W. Jacobs, Tom Goldstein
2017 International Conference on Artificial Intelligence and Statistics  
The resulting methods have similar convergence rates to classical SGD, and do not require convexity of the objective.  ...  Classical stochastic gradient methods for optimization rely on noisy gradient approximations that become progressively less accurate as iterates approach a solution.  ...  Jacobs were supported by the O ce of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via IARPA R&D Contract No. 2014-14071600012.  ... 
dblp:conf/aistats/DeYJG17 fatcat:dlclwyfnqja4fa5z5yeo6djhcu

Stochastic Mirror Descent: Convergence Analysis and Adaptive Variants via the Mirror Stochastic Polyak Stepsize [article]

Ryan D'Orazio, Nicolas Loizou, Issam Laradji, Ioannis Mitliagkas
2021 arXiv   pre-print
We investigate the convergence of stochastic mirror descent (SMD) in relatively smooth and smooth convex optimization.  ...  the benefits of mirror descent.  ...  methods such as stochastic projected gradient descent (SPGD).  ... 
arXiv:2110.15412v2 fatcat:dd2rm6ovonc25ewynh7tfwvn44

Adaptive Term Weighting through Stochastic Optimization [chapter]

Michael Granitzer
2010 Lecture Notes in Computer Science  
Via stochastic optimization we determine a linear transformation of the term space to approximate expected similarity values among documents.  ...  We evaluate our approach on 18 standard text data sets and show that the performance improvement of a k-NN classifier ranges between 1% and 12% by using adaptive term weighting as preprocessing step.  ...  and Technology, the Austrian Ministry of Economics and Labor and by the State of Styria.  ... 
doi:10.1007/978-3-642-12116-6_52 fatcat:pepvkuc225fdxipyinm22vdsza

Page 145 of Neural Computation Vol. 4, Issue 2 [page]

1992 Neural Computation  
The convergence properties of the LMS algorithm with adaptive learn- ing rate are presented in Luo (1991), together with a clear comparison of the LMS algorithm with stochastic gradient descent and adaptive  ...  The result is that the convergence of stochastic LMS is guaranteed if « < 1/(N VAmax), Where N is the number of parameters being optimized and Amax is the largest eigenvalue of the autocorrelation function  ... 

A Game-Theoretic Model for Co-Adaptive Brain-Machine Interfaces [article]

Maneeshika M. Madduri, Samuel A. Burden, Amy L. Orsborn
2020 bioRxiv   pre-print
Assuming the brain and the decoder adapt using gradient-based schemes, we analytically show how convergence to these equilibria depends on agent learning rates.  ...  We frame our BMI model as a potential game to identify stationary points (Nash equilibria) of the brain-decoder interactions, which correspond to points at which both the brain and the decoder stop adapting  ...  The learning dynamics of co-adaptation in these models either propose a stochastic coordinate descent, where the decoder learns to anticipate the brain's adaptation [11] or a stochastic gradient descent  ... 
doi:10.1101/2020.12.11.421800 fatcat:hwd2plgdy5a4zloifwmzx5z4ay

New Adaptive Linear Discriminante Analysis For Face Recognition With Svm

Mehdi Ghayoumi
2008 Zenodo  
The new algorithm has the advantage of optimal selection of the step size. The gradient descent method and new algorithm has been implemented in software and evaluated on the Yale face database B.  ...  The eigenfaces of these approaches have been used to training a KNN. Recognition rate with new algorithm is compared with gradient.  ...  In Section 3, the adaptive computation of the square root of the inverse covariance matrix 2 / 1 − Σ based on the gradient descent method is presented and its convergence is proved using the stochastic  ... 
doi:10.5281/zenodo.1079331 fatcat:ejpn33orgva65l7dtlmrs5gzsa

Conjugate Directions for Stochastic Gradient Descent [chapter]

Nicol N. Schraudolph, Thore Graepel
2002 Lecture Notes in Computer Science  
In our benchmark experiments the resulting online learning algorithms converge orders of magnitude faster than ordinary stochastic gradient descent.  ...  The method of conjugate gradients provides a very effective way to optimize large, deterministic systems by gradient descent.  ...  Fig. 2 ( 2 right) shows that stochastic descent in direction of c (dashed) indeed sports much better late convergence than steepest descent (dotted).  ... 
doi:10.1007/3-540-46084-5_218 fatcat:mye3y4evavggdekufwafdd53ba

Recent Improvements of Gradient Descent Method for Optimization

Shweta Agrawal, Ravishek Kumar Singh
2022 International Journal of Computer Applications  
The purpose of Gradient descent technique is to make changes in set of parameters for reaching optimal parameters.  ...  Gradient descent is best and common method used for optimization. Gradient descent is one of the optimization techniques apply when machine learning based model or algorithm or trained.  ...  TYPES OF GRADIENT DESCENT Three most commonly types of Gradient are [11, 12, 14 ]: 1. Gradient based Batch 2. Stochastic Gradient 3.  ... 
doi:10.5120/ijca2022921908 fatcat:jebgvlj5hrebzp26fx7evxq5d4

A State-of-the-art Survey of Advanced Optimization Methods in Machine Learning

Muhamet Kastrati, Marenglen Biba
2021 International Conference on Recent Trends and Applications in Computer Science and Information Technology  
The main objective of this paper is to provide a state-of-the-art survey of advanced optimization methods used in machine learning.  ...  Then optimization is presented along with a review of the most recent state-of-the-art methods and algorithms that are being extensively used in machine learning in general and deep neural networks in  ...  We hope that the issues discussed in this paper will push forward the discussion in the area of optimization and machine learning, on the same time it may serve as complementary material for other researchers  ... 
dblp:conf/rtacsit/KastratiB21 fatcat:gcvz6va2wrdgvcorfb52qdac4q

Why gradient clipping accelerates training: A theoretical justification for adaptivity [article]

Jingzhao Zhang, Tianxing He, Suvrit Sra, Ali Jadbabaie
2020 arXiv   pre-print
Under the new condition, we prove that two popular methods, namely, gradient clipping and normalized gradient, converge arbitrarily faster than gradient descent with fixed stepsize.  ...  We further explain why such adaptively scaled gradient methods can accelerate empirical convergence and verify our results empirically in popular neural network training settings.  ...  Second, we only study convergence of clipped gradient descent.  ... 
arXiv:1905.11881v2 fatcat:65bm7j4dlrgkdjl3bzlwc3u32e
« Previous Showing results 1 — 15 out of 35,897 results