A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
Implicit Gradient Regularization
[article]
2021
arXiv
pre-print
We find that the discrete steps of gradient descent implicitly regularize models by penalizing gradient descent trajectories that have large loss gradients. ...
We call this Implicit Gradient Regularization (IGR) and we use backward error analysis to calculate the size of this regularization. ...
Implicit gradient regularization is the implicit regularisation behaviour originating from the use of discrete update steps in gradient descent, as characterized by Equation 2. ...
arXiv:2009.11162v2
fatcat:f2ylmi3eoncvjjavna3ossub6i
Regularization in network optimization via trimmed stochastic gradient descent with noisy label
[article]
2022
arXiv
pre-print
The label noise provides a strong implicit regularization by replacing the target ground truth labels of training examples by uniform random labels. ...
Regularization is essential for avoiding over-fitting to training data in network optimization, leading to better generalization of the trained networks. ...
The neural network model is trained using the stochastic gradient descent (SGD) and its variants in combination with explicit and implicit regularization methods. ...
arXiv:2012.11073v3
fatcat:si5b25mbqzea5aytmxlami55gy
The Implicit Regularization of Stochastic Gradient Flow for Least Squares
[article]
2020
arXiv
pre-print
We study the implicit regularization of mini-batch stochastic gradient descent, when applied to the fundamental problem of least squares regression. ...
We give a bound on the excess risk of stochastic gradient flow at time t, over ridge regression with tuning parameter λ = 1/t. ...
of the current work on implicit regularization. ...
arXiv:2003.07802v2
fatcat:req5g6wjlncsbexknakh7kbyii
SGD Implicitly Regularizes Generalization Error
[article]
2021
arXiv
pre-print
We then compare the change in the test error for stochastic gradient descent to the change in test error from an equivalent number of gradient descent updates and show explicitly that stochastic gradient ...
These calculations depends on the details of the model only through the mean and covariance of the gradient distribution, which may be readily measured for particular models of interest. ...
This extended abstract was brought to you by the letter Σ after averaging over many different realizations. ...
arXiv:2104.04874v1
fatcat:suctw27psffa5chxf2w2yup4m4
Regularization in Network Optimization via Trimmed Stochastic Gradient Descent With Noisy Label
2022
IEEE Access
The label noise provides a strong implicit regularization by replacing the target ground truth labels of training examples by uniform random labels. ...
Regularization is essential for avoiding over-fitting to training data in network optimization, leading to better generalization of the trained networks. ...
The neural network model is trained using the stochastic gradient descent (SGD) and its variants in combination with explicit and implicit regularization methods. ...
doi:10.1109/access.2022.3171910
fatcat:cjw5livwqbhlrithvtsplgbcju
Stochastic Training is Not Necessary for Generalization
[article]
2022
arXiv
pre-print
It is widely believed that the implicit regularization of SGD is fundamental to the impressive generalization behavior we observe in neural networks. ...
In this work, we demonstrate that non-stochastic full-batch training can achieve comparably strong performance to SGD on CIFAR-10 using modern architectures. ...
Arguably this means the regularization is applied explicitly on top of the implicit effect that is already present. ...
arXiv:2109.14119v2
fatcat:izkob2pvcfefhaqospgdzjnr7e
Towards stability and optimality in stochastic gradient descent
[article]
2016
arXiv
pre-print
Iterative procedures for parameter estimation based on stochastic gradient descent allow the estimation to scale to massive data sets. ...
In practice, AI-SGD achieves competitive performance with other state-of-the-art procedures. ...
gradient descent (implicit SGD). ...
arXiv:1505.02417v4
fatcat:tdbkb6kx5vgaxev5fmzwbwvfz4
AgFlow: Fast Model Selection of Penalized PCA via Implicit Regularization Effects of Gradient Flow
[article]
2021
arXiv
pre-print
effect introduced by (stochastic) gradient flow [3, 4] and obtains the complete solution path of L_2-penalized PCA under varying L_2-regularization. ...
In this paper, we propose a fast model selection method for penalized PCA, named Approximated Gradient Flow (AgFlow), which lowers the computation complexity through incorporating the implicit regularization ...
To lower the complexity, inspired by the recent progress on implicit regularization effects of gradient descent (GD) and stochastic gradient descent (SGD) in solving Ordinary Least-Square (OLS) problems ...
arXiv:2110.03273v1
fatcat:gszdfk6n5bd3piclseyb5w6tvm
Block-Cyclic Stochastic Coordinate Descent for Deep Neural Networks
[article]
2017
arXiv
pre-print
It uses different subsets of the data to update different subsets of the parameters, thus limiting the detrimental effect of outliers in the training set. ...
We present a stochastic first-order optimization algorithm, named BCSC, that adds a cyclic constraint to stochastic block-coordinate descent. ...
Stochastic gradient descent (SGD) [29, 31, 45] achieves the dual objective of reducing the computational load as well as improving generalization due to the implicit regularization effect. ...
arXiv:1711.07190v1
fatcat:pxheanmvtnbf7k7zcm7n3iprtm
Matrix Decomposition for Recommendation System
2015
American Journal of Software Engineering and Applications
Alternating least squares (ALS) and stochastic gradient descent (SGD) are two popular approaches to solve optimize problems. ...
Based on the idea of ALS-WR algorithm, we propose a modified SGD algorithm. With experiments on testing dataset, our algorithm outperforms ALS-WR. ...
Thus, there is high in solving loss function (4).
Stochastic Gradient Descent Algorithm Stochastic gradient descent (SGD) is based on gradient descent. ...
doi:10.11648/j.ajsea.20150404.11
fatcat:jrzabhwzhrcvjgrskoikx33vpa
On regularization of gradient descent, layer imbalance and flat minima
[article]
2020
arXiv
pre-print
Finally, we extend the analysis for stochastic gradient descent and show that SGD works similarly to noise regularization. ...
We demonstrate that different regularization methods, such as weight decay or noise data augmentation, behave in a similar way. ...
Acknowledgments We would like to thank Vitaly Lavrukhin, Nadav Cohen and Daniel Soudry for the valuable feedback. ...
arXiv:2007.09286v1
fatcat:e3klkqe3zff3jklnavyk6j7ljy
Stochastic Gradient/Mirror Descent: Minimax Optimality and Implicit Regularization
[article]
2019
arXiv
pre-print
In an attempt to shed some light on why this is the case, we revisit some minimax properties of stochastic gradient descent (SGD) for the square loss of linear models---originally developed in the 1990 ...
Stochastic descent methods (of the gradient and mirror varieties) have become increasingly popular in optimization. ...
WARM-UP: REVISITING SGD ON SQUARE LOSS OF LINEAR MODELS In this section, we describe the main ideas and results in a simple setting, i.e., stochastic gradient descent (SGD) for the square loss of a linear ...
arXiv:1806.00952v4
fatcat:t7nh7hioszfz7hdu5yimuulxhy
Natural gradient, fitness modelling and model selection: A unifying perspective
2013
2013 IEEE Congress on Evolutionary Computation
Finally, we interpet Linear Programming relaxation as an example of Stochastic Relaxation, with respect to the regular gradient. ...
In this paper Stochastic Relaxation is used to provide theoretical results on Estimation of Distribution Algorithms (EDAs). ...
This leads to two different algorithms, Stochastic Gradient Descent (SGD), based on the regular gradient, and Stochastic Natural Gradient Descent (SNGD), based on the natural gradient, reported in Algorithm ...
doi:10.1109/cec.2013.6557608
dblp:conf/cec/MalagoMP13
fatcat:e3smb2pcojhmvi7ootvlszxg2y
Implicit Regularization Properties of Variance Reduced Stochastic Mirror Descent
[article]
2022
arXiv
pre-print
On the other hand, algorithms such as gradient descent and stochastic gradient descent have the implicit regularization property that leads to better performance in terms of the generalization errors. ...
In such a setting, the stochastic mirror descent (SMD) algorithm is a numerically efficient method -- each iteration involving a very small subset of the data. ...
There are works on implicit regularization for Gradient Descent [5] - [8] , Stochastic Gradient Descent [9] - [12] , and Stochastic Mirror Descent [13] . ...
arXiv:2205.00058v1
fatcat:2onmrxjhfvfw3blmv2j4ybp4ua
Early Stopping is Nonparametric Variational Inference
[article]
2015
arXiv
pre-print
By tracking the change in entropy over this sequence of transformations during optimization, we form a scalable, unbiased estimate of the variational lower bound on the log marginal likelihood. ...
We show that unconverged stochastic gradient descent can be interpreted as a procedure that samples from a nonparametric variational approximate posterior distribution. ...
We thank Analog Devices International and Samsung Advanced Institute of Technology for their support. ...
arXiv:1504.01344v1
fatcat:jv4bcypomvda7fp6vhfdkmoteq
« Previous
Showing results 1 — 15 out of 8,145 results