A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Linear Convergence of Adaptive Stochastic Gradient Descent
[article]
2020
arXiv
pre-print
We prove that the norm version of the adaptive stochastic gradient method (AdaGrad-Norm) achieves a linear convergence rate for a subset of either strongly convex functions or non-convex functions that ...
On top of RUIG, we develop a two-stage framework to prove the linear convergence of AdaGrad-Norm without knowing the parameters of the objective functions. ...
The paper focuses on the robustness of the linear convergence of adaptive stochastic gradient descent to unknown hyperparameters. Adaptive gradient descent methods introduced in Duchi et al. ...
arXiv:1908.10525v2
fatcat:a4ds5rawxjfjjazwvqss3rejsu
Using Curvature Information for Fast Stochastic Search
1996
Neural Information Processing Systems
We present an algorithm for fast stochastic gradient descent that uses a nonlinear adaptive momentum scheme to optimize the late time convergence rate. ...
The algorithm makes effective use of curvature information, requires only O(n) storage and computation, and delivers convergence rates close to the theoretical optimum. ...
Implementations of stochastic learning typically use a constant learning rate during the early part of training (what Darken and Moody
Momentum in Stochastic Gradient Descent The adaptive momentum algorithm ...
dblp:conf/nips/OrrL96
fatcat:uiwsuxbydnhgjffwdtz37icxru
Stochastic Learning
[chapter]
2004
Lecture Notes in Computer Science
This contribution presents an overview of the theoretical and practical aspects of the broad family of learning algorithms based on Stochastic Gradient Descent, including Perceptrons, Adalines, K-Means ...
Stochastic gradient descent benefit from the redundancies of the training set. ...
The convergence proofs for both the discrete (3) and stochastic (4) gradient descents follow the same three steps. ...
doi:10.1007/978-3-540-28650-9_7
fatcat:a7fava6fibcwdi5dd4uh6u2qym
Adaptive minimum-BER linear multiuser detection for DS-CDMA signals in multipath channels
2001
IEEE Transactions on Signal Processing
Index Terms-Adaptive algorithms, linear multiuser detectors, minimum bit error rate, minimum mean square error, stochastic gradient algorithms. ...
Based on the approach of kernel density estimation for approximating the bit error rate (BER) from training data, a least mean squares (LMS) style stochastic gradient adaptive algorithm is developed for ...
There are two stochastic gradient adaptive algorithms for realizing the MBER linear multiuser detector in the literature [8] , [9] . ...
doi:10.1109/78.923306
fatcat:wb3citpblfcb3fbgwyirci64oq
Linear Convergence of Generalized Mirror Descent with Time-Dependent Mirrors
[article]
2021
arXiv
pre-print
While several recent works use a PL-based analysis to establish linear convergence of stochastic gradient descent methods, the question remains as to whether a similar analysis can be conducted for more ...
The Polyak-Lojasiewicz (PL) inequality is a sufficient condition for establishing linear convergence of gradient descent, even in non-convex settings. ...
Linear Convergence of Adaptive Stochastic Gradient Descent. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2020. [23] Bing Xu, Naiyan Wang, Tianqi Chen, and Mu Li. ...
arXiv:2009.08574v2
fatcat:ei5lvpwr4ne6rjzktw4p5etilm
Automated Inference with Adaptive Batches
2017
International Conference on Artificial Intelligence and Statistics
The resulting methods have similar convergence rates to classical SGD, and do not require convexity of the objective. ...
Classical stochastic gradient methods for optimization rely on noisy gradient approximations that become progressively less accurate as iterates approach a solution. ...
Jacobs were supported by the O ce of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via IARPA R&D Contract No. 2014-14071600012. ...
dblp:conf/aistats/DeYJG17
fatcat:dlclwyfnqja4fa5z5yeo6djhcu
Stochastic Mirror Descent: Convergence Analysis and Adaptive Variants via the Mirror Stochastic Polyak Stepsize
[article]
2021
arXiv
pre-print
We investigate the convergence of stochastic mirror descent (SMD) in relatively smooth and smooth convex optimization. ...
the benefits of mirror descent. ...
methods such as stochastic projected gradient descent (SPGD). ...
arXiv:2110.15412v2
fatcat:dd2rm6ovonc25ewynh7tfwvn44
Adaptive Term Weighting through Stochastic Optimization
[chapter]
2010
Lecture Notes in Computer Science
Via stochastic optimization we determine a linear transformation of the term space to approximate expected similarity values among documents. ...
We evaluate our approach on 18 standard text data sets and show that the performance improvement of a k-NN classifier ranges between 1% and 12% by using adaptive term weighting as preprocessing step. ...
and Technology, the Austrian Ministry of Economics and Labor and by the State of Styria. ...
doi:10.1007/978-3-642-12116-6_52
fatcat:pepvkuc225fdxipyinm22vdsza
Page 145 of Neural Computation Vol. 4, Issue 2
[page]
1992
Neural Computation
The convergence properties of the LMS algorithm with adaptive learn- ing rate are presented in Luo (1991), together with a clear comparison of the LMS algorithm with stochastic gradient descent and adaptive ...
The result is that the convergence of stochastic LMS is guaranteed if « < 1/(N VAmax), Where N is the number of parameters being optimized and Amax is the largest eigenvalue of the autocorrelation function ...
A Game-Theoretic Model for Co-Adaptive Brain-Machine Interfaces
[article]
2020
bioRxiv
pre-print
Assuming the brain and the decoder adapt using gradient-based schemes, we analytically show how convergence to these equilibria depends on agent learning rates. ...
We frame our BMI model as a potential game to identify stationary points (Nash equilibria) of the brain-decoder interactions, which correspond to points at which both the brain and the decoder stop adapting ...
The learning dynamics of co-adaptation in these models either propose a stochastic coordinate descent, where the decoder learns to anticipate the brain's adaptation [11] or a stochastic gradient descent ...
doi:10.1101/2020.12.11.421800
fatcat:hwd2plgdy5a4zloifwmzx5z4ay
New Adaptive Linear Discriminante Analysis For Face Recognition With Svm
2008
Zenodo
The new algorithm has the advantage of optimal selection of the step size. The gradient descent method and new algorithm has been implemented in software and evaluated on the Yale face database B. ...
The eigenfaces of these approaches have been used to training a KNN. Recognition rate with new algorithm is compared with gradient. ...
In Section 3, the adaptive computation of the square root of the inverse covariance matrix 2 / 1 − Σ based on the gradient descent method is presented and its convergence is proved using the stochastic ...
doi:10.5281/zenodo.1079331
fatcat:ejpn33orgva65l7dtlmrs5gzsa
Conjugate Directions for Stochastic Gradient Descent
[chapter]
2002
Lecture Notes in Computer Science
In our benchmark experiments the resulting online learning algorithms converge orders of magnitude faster than ordinary stochastic gradient descent. ...
The method of conjugate gradients provides a very effective way to optimize large, deterministic systems by gradient descent. ...
Fig. 2 ( 2 right) shows that stochastic descent in direction of c (dashed) indeed sports much better late convergence than steepest descent (dotted). ...
doi:10.1007/3-540-46084-5_218
fatcat:mye3y4evavggdekufwafdd53ba
Recent Improvements of Gradient Descent Method for Optimization
2022
International Journal of Computer Applications
The purpose of Gradient descent technique is to make changes in set of parameters for reaching optimal parameters. ...
Gradient descent is best and common method used for optimization. Gradient descent is one of the optimization techniques apply when machine learning based model or algorithm or trained. ...
TYPES OF GRADIENT DESCENT Three most commonly types of Gradient are [11, 12, 14 ]: 1. Gradient based Batch 2. Stochastic Gradient 3. ...
doi:10.5120/ijca2022921908
fatcat:jebgvlj5hrebzp26fx7evxq5d4
A State-of-the-art Survey of Advanced Optimization Methods in Machine Learning
2021
International Conference on Recent Trends and Applications in Computer Science and Information Technology
The main objective of this paper is to provide a state-of-the-art survey of advanced optimization methods used in machine learning. ...
Then optimization is presented along with a review of the most recent state-of-the-art methods and algorithms that are being extensively used in machine learning in general and deep neural networks in ...
We hope that the issues discussed in this paper will push forward the discussion in the area of optimization and machine learning, on the same time it may serve as complementary material for other researchers ...
dblp:conf/rtacsit/KastratiB21
fatcat:gcvz6va2wrdgvcorfb52qdac4q
Why gradient clipping accelerates training: A theoretical justification for adaptivity
[article]
2020
arXiv
pre-print
Under the new condition, we prove that two popular methods, namely, gradient clipping and normalized gradient, converge arbitrarily faster than gradient descent with fixed stepsize. ...
We further explain why such adaptively scaled gradient methods can accelerate empirical convergence and verify our results empirically in popular neural network training settings. ...
Second, we only study convergence of clipped gradient descent. ...
arXiv:1905.11881v2
fatcat:65bm7j4dlrgkdjl3bzlwc3u32e
« Previous
Showing results 1 — 15 out of 35,897 results