A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Do deep nets really need weight decay and dropout?
[article]
2018
arXiv
pre-print
This overparameterization is often said to be controlled with the help of different regularization techniques, mainly weight decay and dropout. ...
In this paper we build upon recent research that suggests that explicit regularization may not be as important as widely believed and carry out an ablation study that concludes that weight decay and dropout ...
ACKNOWLEDGMENTS This project has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 641805. ...
arXiv:1802.07042v3
fatcat:c5jk6bpuhngqfbcrz7yca63oke
Do Deep Convolutional Nets Really Need to be Deep and Convolutional?
[article]
2017
arXiv
pre-print
This paper provides the first empirical demonstration that deep convolutional models really need to be both deep and convolutional, even when trained with methods such as distillation that allow small ...
Although the student models do not have to be as deep as the teacher model they mimic, the students need multiple convolutional layers to learn functions of comparable accuracy as the deep convolutional ...
original 0/1 hard class labels using Bayesian optimization with dropout and weight decay. ...
arXiv:1603.05691v4
fatcat:nixa6yw7zje5nb2rdqrmf6tqwu
Improving neural networks by preventing co-adaptation of feature detectors
[article]
2012
arXiv
pre-print
Random "dropout" gives big improvements on many benchmark tasks and sets new records for speech and object recognition. ...
It allows much larger nets to be trained and removes the need for early stopping. ...
Deep Belief Nets -We took a neural network pretrained using a Deep Belief Network (5). ...
arXiv:1207.0580v1
fatcat:onupztn52jcl7az7ksbptz34lm
Data augmentation instead of explicit regularization
[article]
2020
arXiv
pre-print
Second, we contrast data augmentation with weight decay and dropout. ...
Despite the fact that some (explicit) regularization techniques, such as weight decay and dropout, require costly fine-tuning of sensitive hyperparameters, the interplay between them and other elements ...
Acknowledgments This project has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 641805. ...
arXiv:1806.03852v5
fatcat:ussbftciyjhdhd2t4vmltkzs3m
BinaryConnect: Training Deep Neural Networks with binary weights during propagations
[article]
2016
arXiv
pre-print
As a result, there is much interest in research and development of dedicated hardware for Deep Learning (DL). ...
Like other dropout schemes, we show that BinaryConnect acts as regularizer and we obtain near state-of-the-art results with BinaryConnect on the permutation-invariant MNIST, CIFAR-10 and SVHN. ...
We preprocess the data using global contrast normalization and ZCA whitening. We do not use any data-augmentation (which can really be a game changer for this dataset [35] ). ...
arXiv:1511.00363v3
fatcat:ljfvx3fkwrewdljaau7ogvbyiu
A continual learning survey: Defying forgetting in classification tasks
[article]
2020
arXiv
pre-print
We study the influence of model capacity, weight decay and dropout regularization, and the order in which the tasks are presented, and qualitatively compare methods in terms of required memory, computation ...
Continual learning shifts this paradigm towards networks that can continually accumulate knowledge over different tasks without the need to retrain from scratch. ...
LwF and EBLL suffer from using dropout, with only the DEEP model significantly improving performance. For weight decay the methods follow the general trend, enhancing the WIDE net accuracies. ...
arXiv:1909.08383v2
fatcat:vhvlwslqa5cefitcnajs7hp5nu
Medi-Care AI: Predicting Medications From Billing Codes via Robust Recurrent Neural Networks
[article]
2019
arXiv
pre-print
By doing this, billing codes are reformulated into its temporal patterns with decay rates on each medical variable, and the hidden states of RNNs are regularised by random noises which serve as dropout ...
We present a general robust framework that explicitly models the possible contamination through overtime decay mechanism on the input billing codes and noise injection into the recurrent hidden states, ...
The dropout rate is 0.3 and the norm-2 regularization is applied into the weight matrix of W code . ...
arXiv:2001.10065v1
fatcat:py6eteeyvvdibllyk2rufk5iqm
A contextual analysis of multi-layer perceptron models in classifying hand-written digits and letters: limited resources
[article]
2021
arXiv
pre-print
Using dimensionality reduction done by PCA we were able to increase that figure to 85.08% with only 10% of the original feature space, reducing the memory size needed by 64%. ...
Classifying hand-written digits and letters has taken a big leap with the introduction of ConvNets. However, on very constrained hardware the time necessary to train such models would be high. ...
Availability of data and material The dataset is publicly available here: link (EMNIST Balanced dataset).
Code availability The computer code is available at: link. ...
arXiv:2107.01782v1
fatcat:mrlpwyes5ngtlgxrjhwryhh5pa
Highway State Gating for Recurrent Highway Networks: Improving Information Flow Through Time
[chapter]
2018
Lecture Notes in Computer Science
Training deep RNNs still remains a challenge, and most of the state-of-the-art models are structured with a transition depth of 2-4 layers. ...
By using a gating mechanism for the state, we allow the net to "choose" whether to pass information directly through time, or to gate it. ...
For regularization, we use variational dropout [5] , and L2 weight decay. The learning rate exponentially decreased at each epoch. ...
doi:10.1007/978-3-319-94147-9_10
fatcat:h6brvrix3vgfthhgnvuyrojszu
Regularizing CNNs with Locally Constrained Decorrelations
[article]
2017
arXiv
pre-print
In particular, we show that the models regularized with OrthoReg have higher accuracy bounds even when batch normalization and dropout are present. ...
Regularization is key for deep learning since it allows training more complex models while keeping lower levels of overfitting. ...
ACKNOWLEDGEMENTS Authors acknowledge the support of the Spanish project TIN2015-65464-R (MINECO/FEDER), the 2016FI B 01163 grant of Generalitat de Catalunya, and the COST Action IC1307 iV&L Net (European ...
arXiv:1611.01967v2
fatcat:hyhvpm3zircjdebwsb7wybrqx4
Multi-Order Networks for Action Unit Detection
[article]
2022
arXiv
pre-print
Furthermore, we introduce warm up and order dropout to enhance order selection by encouraging order exploration. ...
Deep multi-task methods, where several tasks are learned within a single network, have recently attracted increasing attention. ...
Combined warmup and dropout provide MONET with an initial order guess that is likely to be good, and the ability to move away from this guess if needed. ...
arXiv:2202.00446v1
fatcat:soagz4ewujd4thgxxlx3apbp5u
On the Potential of Simple Framewise Approaches to Piano Transcription
[article]
2016
arXiv
pre-print
Exploiting recent advances in training techniques and new regularizers, and taking into account hyper-parameter tuning, we show that it is possible, by simple bottom-up frame-wise processing, to obtain ...
Thus, we propose this simple approach as a new baseline for this dataset, for future transcription research to build on and improve. ...
Weight Decay To reduce overfitting and regularizing the network, different priors can be imposed on the network weights. ...
arXiv:1612.05153v1
fatcat:b55ycsstjbbeljgr5zpkfeq76i
On The Potential Of Simple Framewise Approaches To Piano Transcription
2016
Zenodo
Weight Decay To reduce overfitting and regularizing the network, different priors can be imposed on the network weights. ...
Not only does this alleviate the need of the weights of the subsequent layer to adapt to a changing input distribution during training, it also keeps the nonlinearities from saturating and in turn speeds ...
doi:10.5281/zenodo.1416488
fatcat:5ginh5wkqrehjjbwwdznqji3ya
Data augmentation and image understanding
[article]
2020
arXiv
pre-print
For that purpose, I have studied tools and aspects from cognitive science and computational neuroscience, and attempted to incorporate them into machine learning models of vision. ...
This dissertation explores some advantageous synergies between machine learning, cognitive science and neuroscience. In particular, this thesis focuses on vision and images. ...
Do deep nets really need weight decay and dropout? ...
arXiv:2012.14185v1
fatcat:qcip4vstzvbxzo4qevek5marrm
Latent Weights Do Not Exist: Rethinking Binarized Neural Network Optimization
[article]
2019
arXiv
pre-print
Together, the redefinition of latent weights as inertia and the introduction of Bop enable a better understanding of BNN optimization and open up the way for further improvements in training methodologies ...
In this paper, we argue that these latent weights cannot be treated analogously to weights in real-valued networks. Instead their main role is to provide inertia during training. ...
One way people currently translate knowledge from the real-valued network to the BNN is through initialization of the latent weights, which is becoming increasingly sophisticated [4, 8, 34] . ...
arXiv:1906.02107v2
fatcat:7fptrjarhnagna3ey6gy35kjc4
« Previous
Showing results 1 — 15 out of 891 results