A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
Understanding and Scheduling Weight Decay
[article]
2021
arXiv
pre-print
Third, we provide an effective learning-rate-aware scheduler for weight decay, called the Stable Weight Decay (SWD) method, which, to the best of our knowledge, is the first practical design for weight ...
Weight decay is a popular and even necessary regularization technique for training deep neural networks that generalize well. ...
Stable/Decoupled Weight Decay often outperform L 2 regularization for optimizers involving in momentum. ...
arXiv:2011.11152v4
fatcat:gbuwvxetvnbb5cpa6hfpwsh34u
Page 478 of Neural Computation Vol. 8, Issue 3
[page]
1996
Neural Computation
The smoothing regularizer yields a symmetric a-stable (or leptokurtic) distribution of weights (large peak near zero and long tails), whereas the quadratic weight decay pro- duces a distribution that is ...
trained with our smoothing regularizer and those with standard weight decay. ...
β-DARTS: Beta-Decay Regularization for Differentiable Architecture Search
[article]
2022
arXiv
pre-print
To solve these two problems, a simple-but-efficient regularization method, termed as Beta-Decay, is proposed to regularize the DARTS-based NAS searching process. ...
Specifically, Beta-Decay regularization can impose constraints to keep the value and variance of activated architecture parameters from too large. ...
As shown in Fig. 2 , DARTS with L2 or weight decay regularization suffers from the performance collapse issue, while DARTS with Beta-Decay regularization has a stable search process. ...
arXiv:2203.01665v2
fatcat:3q3daee5lvhs5elcvu6vujzb3e
Tangent-Space Regularization for Neural-Network Models of Dynamical Systems
[article]
2018
arXiv
pre-print
Furthermore, the influence of L_2 weight regularization on the learned Jacobian eigenvalue spectrum, and hence system stability, is investigated. ...
This work introduces the concept of tangent space regularization for neural-network models of dynamical systems. ...
The first examples demonstrate the effectiveness of tangentspace regularization, whereas later examples demonstrate the influence of weight decay.
A. ...
arXiv:1806.09919v1
fatcat:z37e64da2jbw7cdhaoniu6snee
Two regularizers for recursive least squared algorithms in feedforward multilayered neural networks
2001
IEEE Transactions on Neural Networks
weight decay effect as training progresses. ...
Though the standard RLS algorithm has an implicit weight decay term in its energy function, the weight decay effect decreases linearly as the number of learning epochs increases, thus rendering a diminishing ...
Background on Weight Decay In the standard weight decay method, there is a quadratic regularization term in the cost function [15] , [20] , given by (1) where the regularization constant is a positive ...
doi:10.1109/72.963768
pmid:18249961
fatcat:ntal4y42fvcvlfaef2vmwv7grq
Understanding the Disharmony between Weight Normalization Family and Weight Decay: ϵ-shifted L_2 Regularizer
[article]
2019
arXiv
pre-print
Surprisingly, W must be decayed during gradient descent, otherwise we will observe a severe under-fitting problem, which is very counter-intuitive since weight decay is widely known to prevent deep networks ...
Furthermore, we also expose several critical problems when introducing weight decay term to weight normalization family, including the missing of global minimum and training instability. ...
The central reason is that weight decay helps to control the effective learning rate in a stable and reasonable range. ...
arXiv:1911.05920v1
fatcat:frl43x35jrha7p2hloa2wr5wvi
Weight Rescaling: Effective and Robust Regularization for Deep Neural Networks with Batch Normalization
[article]
2022
arXiv
pre-print
To address those weaknesses, we propose to regularize the weight norm using a simple yet effective weight rescaling (WRS) scheme as an alternative to weight decay. ...
decay, implicit weight rescaling (weight standardization) and gradient projection (AdamP). ...
Weight Decay Regularization and BatchNorm Several works have studied the effects of weight decay regularization and its effect on BatchNorm DNNs. ...
arXiv:2102.03497v2
fatcat:3cll3hd7cbctlp5yqaepqhyan4
A Smoothing Regularizer for Feedforward and Recurrent Neural Networks
1996
Neural Computation
Empirical results show that the smoothing regularizer yields a real symmetric a-stable (SaS) weight distribution, whereas standard quadratic weight decay produces a normal distribution. ...
The smoothing regularizer yields a symmetric a-stable (or leptokurtic) distribution of weights (large peak near zero and long tails), whereas the quadratic weight decay pro- duces a distribution that is ...
doi:10.1162/neco.1996.8.3.461
fatcat:bqwqnucf2ndfndkb5gvjaevof4
Stable reduction to the pole at the magnetic equator
2001
Geophysics
The applied regularization alleviates the singularity associated with the wavenumber-domain RTP operator, and the imposed power spectral decay ensures that the constructed RTP field has the correct spectral ...
We develop a solution to this problem that allows stable reconstruction of the RTP field with a high fidelity even at the magnetic equator. ...
It is therefore necessary to incorporate the knowledge about the spectral decay through the use of the weighting function. ...
doi:10.1190/1.1444948
fatcat:dpa7uisnnfdmxlkgbwi6yq5xdm
Understanding the Disharmony between Weight Normalization Family and Weight Decay
2020
PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE
Surprisingly, W must be decayed during gradient descent, otherwise we will observe a severe under-fitting problem, which is very counter-intuitive since weight decay is widely known to prevent deep networks ...
Moreover, if we substitute (e.g., weight normalization) W′ = W∥W∥ in the original loss function ∑i L(ƒ(xi; W′),yi) + ½λ∥W′∥2, it is observed that the regularization term ½λ∥W′∥2 will be canceled as a constant ...
The central reason is that weight decay helps to control the effective learning rate in a stable and reasonable range. ...
doi:10.1609/aaai.v34i04.5904
fatcat:f6t3x7jfo5bs5dm4cmsmygdfre
Surprising Instabilities in Training Deep Networks and a Theoretical Analysis
[article]
2022
arXiv
pre-print
We show that it is stable only under certain conditions on the learning rate and weight decay. ...
., localized over iterations and regions of the weight tensor space. ...
However in the case that a < 0, the weight decay must be chosen large enough to be stable. ...
arXiv:2206.02001v1
fatcat:wcfw6tjyubgthe2dlqpqhmm7na
Page 2116 of The Journal of Neuroscience Vol. 16, Issue 6
[page]
1996
The Journal of Neuroscience
rule with weight decay (see Appendix 2). ...
C, When the weight regularization is too strong, the actual stable firing profile tends to be blunter than the desired one, or even becomes totally flat (not shown). ...
How I Learned to Stop Worrying and Love Retraining
[article]
2022
arXiv
pre-print
weight decay. ...
For the retraining phase we deactivate weight decay.
DPF As for GMP, we tune the number of pruning steps, i.e., {20, 100}, and the weight decay. ...
Secondly, we denote the time needed when compared to regular training of a dense model, e.g. LC needs 1.14 times as much runtime as regular training. ...
arXiv:2111.00843v2
fatcat:gyet2ak2mrhuzgqzoqymva7uaa
Understanding the Role of Training Regimes in Continual Learning
[article]
2020
arXiv
pre-print
However, there has been limited prior work extensively analyzing the impact that different training regimes -- learning rate, batch size, regularization method-- can have on forgetting. ...
In particular, we study the effect of dropout, learning rate decay, and batch size, on forming training regimes that widen the tasks' local minima and consequently, on helping it not to forget catastrophically ...
Regularization: dropout and weight decay We relate the theoretical insights on dropout and L 2 regularization (weight decay) to our analysis in the previous section. ...
arXiv:2006.06958v1
fatcat:kq545vj3brf6nchbxjo3rlwnb4
Stabilization of the inverse Laplace transform of multiexponential decay through introduction of a second dimension
2013
Journal of magnetic resonance (San Diego, Calif. 1997 : Print)
We propose a new approach to stabilizing the inverse Laplace transform of a multiexponential decay signal, a classically ill-posed problem, in the context of nuclear magnetic resonance relaxometry. ...
We find markedly improved accuracy, and stability with respect to noise, as well as insensitivity to regularization in quantifying underlying relaxation components through use of the two-dimensional as ...
We find markedly improved stability, accuracy, and insensitivity to regularization. Celik et al. Page 8 ...
doi:10.1016/j.jmr.2013.07.008
pmid:24035004
pmcid:PMC3818505
fatcat:wrrwuxmdbfhpjkyae6sjjrkno4
« Previous
Showing results 1 — 15 out of 97,563 results