Filters








8,145 Hits in 5.5 sec

Implicit Gradient Regularization [article]

David G.T. Barrett, Benoit Dherin
2021 arXiv   pre-print
We find that the discrete steps of gradient descent implicitly regularize models by penalizing gradient descent trajectories that have large loss gradients.  ...  We call this Implicit Gradient Regularization (IGR) and we use backward error analysis to calculate the size of this regularization.  ...  Implicit gradient regularization is the implicit regularisation behaviour originating from the use of discrete update steps in gradient descent, as characterized by Equation 2.  ... 
arXiv:2009.11162v2 fatcat:f2ylmi3eoncvjjavna3ossub6i

Regularization in network optimization via trimmed stochastic gradient descent with noisy label [article]

Kensuke Nakamura, Bong-Soo Sohn, Kyoung-Jae Won, Byung-Woo Hong
2022 arXiv   pre-print
The label noise provides a strong implicit regularization by replacing the target ground truth labels of training examples by uniform random labels.  ...  Regularization is essential for avoiding over-fitting to training data in network optimization, leading to better generalization of the trained networks.  ...  The neural network model is trained using the stochastic gradient descent (SGD) and its variants in combination with explicit and implicit regularization methods.  ... 
arXiv:2012.11073v3 fatcat:si5b25mbqzea5aytmxlami55gy

The Implicit Regularization of Stochastic Gradient Flow for Least Squares [article]

Alnur Ali, Edgar Dobriban, Ryan J. Tibshirani
2020 arXiv   pre-print
We study the implicit regularization of mini-batch stochastic gradient descent, when applied to the fundamental problem of least squares regression.  ...  We give a bound on the excess risk of stochastic gradient flow at time t, over ridge regression with tuning parameter λ = 1/t.  ...  of the current work on implicit regularization.  ... 
arXiv:2003.07802v2 fatcat:req5g6wjlncsbexknakh7kbyii

SGD Implicitly Regularizes Generalization Error [article]

Daniel A. Roberts
2021 arXiv   pre-print
We then compare the change in the test error for stochastic gradient descent to the change in test error from an equivalent number of gradient descent updates and show explicitly that stochastic gradient  ...  These calculations depends on the details of the model only through the mean and covariance of the gradient distribution, which may be readily measured for particular models of interest.  ...  This extended abstract was brought to you by the letter Σ after averaging over many different realizations.  ... 
arXiv:2104.04874v1 fatcat:suctw27psffa5chxf2w2yup4m4

Regularization in Network Optimization via Trimmed Stochastic Gradient Descent With Noisy Label

Kensuke Nakamura, Bong-Soo Sohn, Kyoung-Jae Won, Byung-Woo Hong
2022 IEEE Access  
The label noise provides a strong implicit regularization by replacing the target ground truth labels of training examples by uniform random labels.  ...  Regularization is essential for avoiding over-fitting to training data in network optimization, leading to better generalization of the trained networks.  ...  The neural network model is trained using the stochastic gradient descent (SGD) and its variants in combination with explicit and implicit regularization methods.  ... 
doi:10.1109/access.2022.3171910 fatcat:cjw5livwqbhlrithvtsplgbcju

Stochastic Training is Not Necessary for Generalization [article]

Jonas Geiping, Micah Goldblum, Phillip E. Pope, Michael Moeller, Tom Goldstein
2022 arXiv   pre-print
It is widely believed that the implicit regularization of SGD is fundamental to the impressive generalization behavior we observe in neural networks.  ...  In this work, we demonstrate that non-stochastic full-batch training can achieve comparably strong performance to SGD on CIFAR-10 using modern architectures.  ...  Arguably this means the regularization is applied explicitly on top of the implicit effect that is already present.  ... 
arXiv:2109.14119v2 fatcat:izkob2pvcfefhaqospgdzjnr7e

Towards stability and optimality in stochastic gradient descent [article]

Panos Toulis, Dustin Tran, Edoardo M. Airoldi
2016 arXiv   pre-print
Iterative procedures for parameter estimation based on stochastic gradient descent allow the estimation to scale to massive data sets.  ...  In practice, AI-SGD achieves competitive performance with other state-of-the-art procedures.  ...  gradient descent (implicit SGD).  ... 
arXiv:1505.02417v4 fatcat:tdbkb6kx5vgaxev5fmzwbwvfz4

AgFlow: Fast Model Selection of Penalized PCA via Implicit Regularization Effects of Gradient Flow [article]

Haiyan Jiang, Haoyi Xiong, Dongrui Wu, Ji Liu, Dejing Dou
2021 arXiv   pre-print
effect introduced by (stochastic) gradient flow [3, 4] and obtains the complete solution path of L_2-penalized PCA under varying L_2-regularization.  ...  In this paper, we propose a fast model selection method for penalized PCA, named Approximated Gradient Flow (AgFlow), which lowers the computation complexity through incorporating the implicit regularization  ...  To lower the complexity, inspired by the recent progress on implicit regularization effects of gradient descent (GD) and stochastic gradient descent (SGD) in solving Ordinary Least-Square (OLS) problems  ... 
arXiv:2110.03273v1 fatcat:gszdfk6n5bd3piclseyb5w6tvm

Block-Cyclic Stochastic Coordinate Descent for Deep Neural Networks [article]

Kensuke Nakamura, Stefano Soatto, Byung-Woo Hong
2017 arXiv   pre-print
It uses different subsets of the data to update different subsets of the parameters, thus limiting the detrimental effect of outliers in the training set.  ...  We present a stochastic first-order optimization algorithm, named BCSC, that adds a cyclic constraint to stochastic block-coordinate descent.  ...  Stochastic gradient descent (SGD) [29, 31, 45] achieves the dual objective of reducing the computational load as well as improving generalization due to the implicit regularization effect.  ... 
arXiv:1711.07190v1 fatcat:pxheanmvtnbf7k7zcm7n3iprtm

Matrix Decomposition for Recommendation System

Jie Zhu
2015 American Journal of Software Engineering and Applications  
Alternating least squares (ALS) and stochastic gradient descent (SGD) are two popular approaches to solve optimize problems.  ...  Based on the idea of ALS-WR algorithm, we propose a modified SGD algorithm. With experiments on testing dataset, our algorithm outperforms ALS-WR.  ...  Thus, there is high in solving loss function (4). Stochastic Gradient Descent Algorithm Stochastic gradient descent (SGD) is based on gradient descent.  ... 
doi:10.11648/j.ajsea.20150404.11 fatcat:jrzabhwzhrcvjgrskoikx33vpa

On regularization of gradient descent, layer imbalance and flat minima [article]

Boris Ginsburg
2020 arXiv   pre-print
Finally, we extend the analysis for stochastic gradient descent and show that SGD works similarly to noise regularization.  ...  We demonstrate that different regularization methods, such as weight decay or noise data augmentation, behave in a similar way.  ...  Acknowledgments We would like to thank Vitaly Lavrukhin, Nadav Cohen and Daniel Soudry for the valuable feedback.  ... 
arXiv:2007.09286v1 fatcat:e3klkqe3zff3jklnavyk6j7ljy

Stochastic Gradient/Mirror Descent: Minimax Optimality and Implicit Regularization [article]

Navid Azizan, Babak Hassibi
2019 arXiv   pre-print
In an attempt to shed some light on why this is the case, we revisit some minimax properties of stochastic gradient descent (SGD) for the square loss of linear models---originally developed in the 1990  ...  Stochastic descent methods (of the gradient and mirror varieties) have become increasingly popular in optimization.  ...  WARM-UP: REVISITING SGD ON SQUARE LOSS OF LINEAR MODELS In this section, we describe the main ideas and results in a simple setting, i.e., stochastic gradient descent (SGD) for the square loss of a linear  ... 
arXiv:1806.00952v4 fatcat:t7nh7hioszfz7hdu5yimuulxhy

Natural gradient, fitness modelling and model selection: A unifying perspective

Luigi Malago, Matteo Matteucci, Giovanni Pistone
2013 2013 IEEE Congress on Evolutionary Computation  
Finally, we interpet Linear Programming relaxation as an example of Stochastic Relaxation, with respect to the regular gradient.  ...  In this paper Stochastic Relaxation is used to provide theoretical results on Estimation of Distribution Algorithms (EDAs).  ...  This leads to two different algorithms, Stochastic Gradient Descent (SGD), based on the regular gradient, and Stochastic Natural Gradient Descent (SNGD), based on the natural gradient, reported in Algorithm  ... 
doi:10.1109/cec.2013.6557608 dblp:conf/cec/MalagoMP13 fatcat:e3smb2pcojhmvi7ootvlszxg2y

Implicit Regularization Properties of Variance Reduced Stochastic Mirror Descent [article]

Yiling Luo, Xiaoming Huo, Yajun Mei
2022 arXiv   pre-print
On the other hand, algorithms such as gradient descent and stochastic gradient descent have the implicit regularization property that leads to better performance in terms of the generalization errors.  ...  In such a setting, the stochastic mirror descent (SMD) algorithm is a numerically efficient method -- each iteration involving a very small subset of the data.  ...  There are works on implicit regularization for Gradient Descent [5] - [8] , Stochastic Gradient Descent [9] - [12] , and Stochastic Mirror Descent [13] .  ... 
arXiv:2205.00058v1 fatcat:2onmrxjhfvfw3blmv2j4ybp4ua

Early Stopping is Nonparametric Variational Inference [article]

Dougal Maclaurin, David Duvenaud, Ryan P. Adams
2015 arXiv   pre-print
By tracking the change in entropy over this sequence of transformations during optimization, we form a scalable, unbiased estimate of the variational lower bound on the log marginal likelihood.  ...  We show that unconverged stochastic gradient descent can be interpreted as a procedure that samples from a nonparametric variational approximate posterior distribution.  ...  We thank Analog Devices International and Samsung Advanced Institute of Technology for their support.  ... 
arXiv:1504.01344v1 fatcat:jv4bcypomvda7fp6vhfdkmoteq
« Previous Showing results 1 — 15 out of 8,145 results