Filters








81,153 Hits in 3.2 sec

Implicit Gradient Regularization [article]

David G.T. Barrett, Benoit Dherin
2021 arXiv   pre-print
Furthermore, we demonstrate that the implicit gradient regularization term can be used as an explicit regularizer, allowing us to control this gradient regularization directly.  ...  We call this Implicit Gradient Regularization (IGR) and we use backward error analysis to calculate the size of this regularization.  ...  we call Implicit Gradient Regularization (IGR).  ... 
arXiv:2009.11162v2 fatcat:f2ylmi3eoncvjjavna3ossub6i

Stochastic Gradient/Mirror Descent: Minimax Optimality and Implicit Regularization [article]

Navid Azizan, Babak Hassibi
2019 arXiv   pre-print
We further show that this identity can be used to naturally establish other properties of SMD (and SGD), namely convergence and implicit regularization for over-parameterized linear models (in what is  ...  regularization properties for deep learning.  ...  The convergence and implicit regularization results hold similarly, and can be formally stated as follows. Proposition 13. Consider the following two cases.  ... 
arXiv:1806.00952v4 fatcat:t7nh7hioszfz7hdu5yimuulxhy

Accelerated Gradient Flow: Risk, Stability, and Implicit Regularization [article]

Yue Sheng, Alnur Ali
2022 arXiv   pre-print
Acceleration and momentum are the de facto standard in modern applications of machine learning and optimization, yet the bulk of the work on implicit regularization focuses instead on unaccelerated methods  ...  In this paper, we study the statistical risk of the iterates generated by Nesterov's accelerated gradient method and Polyak's heavy ball method, when applied to least squares regression, drawing several  ...  Implicit regularization. Nearly all of the work on implicit regularization so far has looked at unaccelerated first-order methods, as mentioned in the introduction.  ... 
arXiv:2201.08311v1 fatcat:fxlyxvfs4barhktxbc24nossey

Implicit Regularization of Discrete Gradient Dynamics in Linear Neural Networks [article]

Gauthier Gidel, Francis Bach, Simon Lacoste-Julien
2019 arXiv   pre-print
Consequently, this choice can be considered as an implicit regularization for the training of over-parametrized models.  ...  In this work, we push this idea further by studying the discrete gradient dynamics of the training of a two-layer linear network with the least-squares loss.  ...  Finally, Gunasekar et al. [2018] compared the implicit regularization provided by gradient descent in deep linear convolutional and fully connected networks.  ... 
arXiv:1904.13262v2 fatcat:mlvfdiv5cbcdjee3ndjp7km43y

The Implicit Regularization of Stochastic Gradient Flow for Least Squares [article]

Alnur Ali, Edgar Dobriban, Ryan J. Tibshirani
2020 arXiv   pre-print
We study the implicit regularization of mini-batch stochastic gradient descent, when applied to the fundamental problem of least squares regression.  ...  We leverage a continuous-time stochastic differential equation having the same moments as stochastic gradient descent, which we call stochastic gradient flow.  ...  Implicit Regularization.  ... 
arXiv:2003.07802v2 fatcat:req5g6wjlncsbexknakh7kbyii

The Implicit Regularization of Momentum Gradient Descent with Early Stopping [article]

Li Wang
2022 arXiv   pre-print
The study on the implicit regularization induced by gradient-based optimization is a longstanding pursuit.  ...  In the present paper, we characterize the implicit regularization of momentum gradient descent (MGD) with early stopping by comparing with the explicit ℓ_2-regularization (ridge).  ...  ., 2020] studied the implicit regularization of stochastic gradient descent (SGD).  ... 
arXiv:2201.05405v1 fatcat:zf7oltcn4fga5aofskqax4dpaa

Convergence and Implicit Regularization Properties of Gradient Descent for Deep Residual Networks [article]

Rama Cont, Alain Rossier, RenYuan Xu
2022 arXiv   pre-print
The proofs are based on non-asymptotic estimates of the loss function and of norms of the network weights along the gradient descent path.  ...  We prove linear convergence of gradient descent to a global minimum for the training of deep residual networks with constant layer width and smooth activation function.  ...  Our result shows that how implicit regularization emerges from gradient descent.  ... 
arXiv:2204.07261v2 fatcat:mgjnsugs3vgv5fipymihg4xqf4

Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind Deconvolution [article]

Cong Ma, Kaizheng Wang, Yuejie Chi, Yuxin Chen
2019 arXiv   pre-print
This "implicit regularization" feature allows gradient descent to proceed in a far more aggressive fashion without overshooting, which in turn results in substantial computational savings.  ...  This paper uncovers a striking phenomenon in nonconvex optimization: even in the absence of explicit regularization, gradient descent enforces proper regularization implicitly under various statistical  ...  This "implicit regularization" phenomenon is of fundamental importance, suggesting that vanilla gradient descent proceeds as if it were properly regularized.  ... 
arXiv:1711.10467v3 fatcat:qmqqujn55nhtbncwcy3c7ka66m

Implicit Regularization of Stochastic Gradient Descent in Natural Language Processing: Observations and Implications [article]

Deren Lei, Zichen Sun, Yijun Xiao, William Yang Wang
2018 arXiv   pre-print
Some theoretical studies have analyzed the implicit regularization effect of stochastic gradient descent (SGD) on simple machine learning models with certain assumptions.  ...  To bridge this gap, we study the role of SGD implicit regularization in deep learning systems.  ...  SGD and implicit regularization We show that the implicit regularization effect of SGD will gradually disappear as pure SGD move towards the mini-batch methods.  ... 
arXiv:1811.00659v1 fatcat:pvuy2jivlzgb5e3bz2ecd53kim

Regularization of microplane damage models using an implicit gradient enhancement

Imadeddin Zreid, Michael Kaliske
2014 International Journal of Solids and Structures  
This problem demands some regularization method to stabilize the solution. The paper focuses on the efficient implementation of implicit gradient enhancement for microplane damage models.  ...  The new method limits the number of additional degrees of freedom to one, while preserving the regularizing effect.  ...  There are two types of gradient models, explicit and implicit. Explicit gradient enhancement is only weakly nonlocal, thus fails to regularize the solution under some circumstances.  ... 
doi:10.1016/j.ijsolstr.2014.06.020 fatcat:hy3ur4rarjd6zi2oz6iu7agayy

Quasi-potential as an implicit regularizer for the loss function in the stochastic gradient descent [article]

Wenqing Hu, Zhanxing Zhu, Haoyi Xiong, Jun Huan
2019 arXiv   pre-print
In this case, we demonstrate an example that shows how the noise covariance structure plays a role in "implicit regularization", a phenomenon in which SGD favors some particular local minimum points.  ...  We interpret the variational inference of the Stochastic Gradient Descent (SGD) as minimizing a new potential function named the quasi-potential.  ...  This has been discussed as a manifestation of implicit regularization via the SGD trajectory (see [3] ).  ... 
arXiv:1901.06054v1 fatcat:fwc65mjb7nakfn42jf233h3nha

Direction Matters: On the Implicit Regularization Effect of Stochastic Gradient Descent with Moderate Learning Rate [article]

Jingfeng Wu, Difan Zou, Vladimir Braverman, Quanquan Gu
2020 arXiv   pre-print
Understanding the algorithmic regularization effect of stochastic gradient descent (SGD) is one of the key challenges in modern machine learning and deep learning theory.  ...  In this paper, we make an initial attempt to characterize the particular regularization effect of SGD in the moderate learning rate regime by studying its behavior for optimizing an overparameterized linear  ...  Implicit regularization in nonconvex statistical estimation: Gradient descent converges linearly for phase retrieval and matrix completion.  ... 
arXiv:2011.02538v1 fatcat:ja3con277vb5zlfn5ms7lsruwe

AgFlow: Fast Model Selection of Penalized PCA via Implicit Regularization Effects of Gradient Flow [article]

Haiyan Jiang, Haoyi Xiong, Dongrui Wu, Ji Liu, Dejing Dou
2021 arXiv   pre-print
In this paper, we propose a fast model selection method for penalized PCA, named Approximated Gradient Flow (AgFlow), which lowers the computation complexity through incorporating the implicit regularization  ...  effect introduced by (stochastic) gradient flow [3, 4] and obtains the complete solution path of L_2-penalized PCA under varying L_2-regularization.  ...  Implicit Regularization with (Stochastic) Gradient Flow.  ... 
arXiv:2110.03273v1 fatcat:gszdfk6n5bd3piclseyb5w6tvm

Evaluation of the Implicit Gradient-Enhanced Regularization of a Damage-Plasticity Rock Model

Magdalena Schreter, Matthias Neuner, Günter Hofstetter
2018 Applied Sciences  
In the present publication, the performance of an implicit gradient-enhanced damage-plasticity model is evaluated with special focus on the prediction of complex failure modes such as shear failure.  ...  To this end, an implicit gradient-enhanced damage-plasticity rock model is presented and validated by means of 2D and 3D finite element simulations of both laboratory tests on intact rock specimens as  ...  Motivated by those deficiencies, in the following, a more sophisticated regularization technique based on the implicit gradient-enhanced formulation [32] is presented.  ... 
doi:10.3390/app8061004 fatcat:smiy7gn4hrgadlay3bf5prrzre

An implicit gradient model by a reproducing kernel strain regularization in strain localization problems

J CHEN
2004 Computer Methods in Applied Mechanics and Engineering  
Both continuum and discrete forms of RKSR are presented, and they lead to an implicit representation of gradient models.  ...  A reproducing kernel strain regularization (RKSR) as a mathematical generalization of gradient theory and nonlocal theory for strain localization problems is presented.  ...  In this paper, we propose a reproducing kernel strain regularization (RKSR) as an implicit representation of gradient models.  ... 
doi:10.1016/s0045-7825(04)00111-2 fatcat:u3qbn3kb6rfync4tiagm5yloeq
« Previous Showing results 1 — 15 out of 81,153 results