891 Hits in 5.5 sec

Do deep nets really need weight decay and dropout? [article]

Alex Hernández-García, Peter König
2018 arXiv   pre-print
This overparameterization is often said to be controlled with the help of different regularization techniques, mainly weight decay and dropout.  ...  In this paper we build upon recent research that suggests that explicit regularization may not be as important as widely believed and carry out an ablation study that concludes that weight decay and dropout  ...  ACKNOWLEDGMENTS This project has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 641805.  ... 
arXiv:1802.07042v3 fatcat:c5jk6bpuhngqfbcrz7yca63oke

Do Deep Convolutional Nets Really Need to be Deep and Convolutional? [article]

Gregor Urban, Krzysztof J. Geras, Samira Ebrahimi Kahou, Ozlem Aslan, Shengjie Wang, Rich Caruana, Abdelrahman Mohamed, Matthai Philipose, Matt Richardson
2017 arXiv   pre-print
This paper provides the first empirical demonstration that deep convolutional models really need to be both deep and convolutional, even when trained with methods such as distillation that allow small  ...  Although the student models do not have to be as deep as the teacher model they mimic, the students need multiple convolutional layers to learn functions of comparable accuracy as the deep convolutional  ...  original 0/1 hard class labels using Bayesian optimization with dropout and weight decay.  ... 
arXiv:1603.05691v4 fatcat:nixa6yw7zje5nb2rdqrmf6tqwu

Improving neural networks by preventing co-adaptation of feature detectors [article]

Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, Ruslan R. Salakhutdinov
2012 arXiv   pre-print
Random "dropout" gives big improvements on many benchmark tasks and sets new records for speech and object recognition.  ...  It allows much larger nets to be trained and removes the need for early stopping.  ...  Deep Belief Nets -We took a neural network pretrained using a Deep Belief Network (5).  ... 
arXiv:1207.0580v1 fatcat:onupztn52jcl7az7ksbptz34lm

Data augmentation instead of explicit regularization [article]

Alex Hernández-García, Peter König
2020 arXiv   pre-print
Second, we contrast data augmentation with weight decay and dropout.  ...  Despite the fact that some (explicit) regularization techniques, such as weight decay and dropout, require costly fine-tuning of sensitive hyperparameters, the interplay between them and other elements  ...  Acknowledgments This project has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 641805.  ... 
arXiv:1806.03852v5 fatcat:ussbftciyjhdhd2t4vmltkzs3m

BinaryConnect: Training Deep Neural Networks with binary weights during propagations [article]

Matthieu Courbariaux, Yoshua Bengio, Jean-Pierre David
2016 arXiv   pre-print
As a result, there is much interest in research and development of dedicated hardware for Deep Learning (DL).  ...  Like other dropout schemes, we show that BinaryConnect acts as regularizer and we obtain near state-of-the-art results with BinaryConnect on the permutation-invariant MNIST, CIFAR-10 and SVHN.  ...  We preprocess the data using global contrast normalization and ZCA whitening. We do not use any data-augmentation (which can really be a game changer for this dataset [35] ).  ... 
arXiv:1511.00363v3 fatcat:ljfvx3fkwrewdljaau7ogvbyiu

A continual learning survey: Defying forgetting in classification tasks [article]

Matthias De Lange, Rahaf Aljundi, Marc Masana, Sarah Parisot, Xu Jia, Ales Leonardis, Gregory Slabaugh, Tinne Tuytelaars
2020 arXiv   pre-print
We study the influence of model capacity, weight decay and dropout regularization, and the order in which the tasks are presented, and qualitatively compare methods in terms of required memory, computation  ...  Continual learning shifts this paradigm towards networks that can continually accumulate knowledge over different tasks without the need to retrain from scratch.  ...  LwF and EBLL suffer from using dropout, with only the DEEP model significantly improving performance. For weight decay the methods follow the general trend, enhancing the WIDE net accuracies.  ... 
arXiv:1909.08383v2 fatcat:vhvlwslqa5cefitcnajs7hp5nu

Medi-Care AI: Predicting Medications From Billing Codes via Robust Recurrent Neural Networks [article]

Deyin Liu, Lin Wu, Xue Li
2019 arXiv   pre-print
By doing this, billing codes are reformulated into its temporal patterns with decay rates on each medical variable, and the hidden states of RNNs are regularised by random noises which serve as dropout  ...  We present a general robust framework that explicitly models the possible contamination through overtime decay mechanism on the input billing codes and noise injection into the recurrent hidden states,  ...  The dropout rate is 0.3 and the norm-2 regularization is applied into the weight matrix of W code .  ... 
arXiv:2001.10065v1 fatcat:py6eteeyvvdibllyk2rufk5iqm

A contextual analysis of multi-layer perceptron models in classifying hand-written digits and letters: limited resources [article]

Tidor-Vlad Pricope
2021 arXiv   pre-print
Using dimensionality reduction done by PCA we were able to increase that figure to 85.08% with only 10% of the original feature space, reducing the memory size needed by 64%.  ...  Classifying hand-written digits and letters has taken a big leap with the introduction of ConvNets. However, on very constrained hardware the time necessary to train such models would be high.  ...  Availability of data and material The dataset is publicly available here: link (EMNIST Balanced dataset). Code availability The computer code is available at: link.  ... 
arXiv:2107.01782v1 fatcat:mrlpwyes5ngtlgxrjhwryhh5pa

Highway State Gating for Recurrent Highway Networks: Improving Information Flow Through Time [chapter]

Ron Shoham, Haim Permuter
2018 Lecture Notes in Computer Science  
Training deep RNNs still remains a challenge, and most of the state-of-the-art models are structured with a transition depth of 2-4 layers.  ...  By using a gating mechanism for the state, we allow the net to "choose" whether to pass information directly through time, or to gate it.  ...  For regularization, we use variational dropout [5] , and L2 weight decay. The learning rate exponentially decreased at each epoch.  ... 
doi:10.1007/978-3-319-94147-9_10 fatcat:h6brvrix3vgfthhgnvuyrojszu

Regularizing CNNs with Locally Constrained Decorrelations [article]

Pau Rodríguez, Jordi Gonzàlez, Guillem Cucurull, Josep M. Gonfaus, Xavier Roca
2017 arXiv   pre-print
In particular, we show that the models regularized with OrthoReg have higher accuracy bounds even when batch normalization and dropout are present.  ...  Regularization is key for deep learning since it allows training more complex models while keeping lower levels of overfitting.  ...  ACKNOWLEDGEMENTS Authors acknowledge the support of the Spanish project TIN2015-65464-R (MINECO/FEDER), the 2016FI B 01163 grant of Generalitat de Catalunya, and the COST Action IC1307 iV&L Net (European  ... 
arXiv:1611.01967v2 fatcat:hyhvpm3zircjdebwsb7wybrqx4

Multi-Order Networks for Action Unit Detection [article]

Gauthier Tallec, Arnaud Dapogny, Kevin Bailly
2022 arXiv   pre-print
Furthermore, we introduce warm up and order dropout to enhance order selection by encouraging order exploration.  ...  Deep multi-task methods, where several tasks are learned within a single network, have recently attracted increasing attention.  ...  Combined warmup and dropout provide MONET with an initial order guess that is likely to be good, and the ability to move away from this guess if needed.  ... 
arXiv:2202.00446v1 fatcat:soagz4ewujd4thgxxlx3apbp5u

On the Potential of Simple Framewise Approaches to Piano Transcription [article]

Rainer Kelz, Matthias Dorfer, Filip Korzeniowski, Sebastian Böck, Andreas Arzt, Gerhard Widmer
2016 arXiv   pre-print
Exploiting recent advances in training techniques and new regularizers, and taking into account hyper-parameter tuning, we show that it is possible, by simple bottom-up frame-wise processing, to obtain  ...  Thus, we propose this simple approach as a new baseline for this dataset, for future transcription research to build on and improve.  ...  Weight Decay To reduce overfitting and regularizing the network, different priors can be imposed on the network weights.  ... 
arXiv:1612.05153v1 fatcat:b55ycsstjbbeljgr5zpkfeq76i

On The Potential Of Simple Framewise Approaches To Piano Transcription

Rainer Kelz, Matthias Dorfer, Filip Korzeniowski, Sebastian Böck, Andreas Arzt, Gerhard Widmer
2016 Zenodo  
Weight Decay To reduce overfitting and regularizing the network, different priors can be imposed on the network weights.  ...  Not only does this alleviate the need of the weights of the subsequent layer to adapt to a changing input distribution during training, it also keeps the nonlinearities from saturating and in turn speeds  ... 
doi:10.5281/zenodo.1416488 fatcat:5ginh5wkqrehjjbwwdznqji3ya

Data augmentation and image understanding [article]

Alex Hernandez-Garcia
2020 arXiv   pre-print
For that purpose, I have studied tools and aspects from cognitive science and computational neuroscience, and attempted to incorporate them into machine learning models of vision.  ...  This dissertation explores some advantageous synergies between machine learning, cognitive science and neuroscience. In particular, this thesis focuses on vision and images.  ...  Do deep nets really need weight decay and dropout?  ... 
arXiv:2012.14185v1 fatcat:qcip4vstzvbxzo4qevek5marrm

Latent Weights Do Not Exist: Rethinking Binarized Neural Network Optimization [article]

Koen Helwegen, James Widdicombe, Lukas Geiger, Zechun Liu, Kwang-Ting Cheng, Roeland Nusselder
2019 arXiv   pre-print
Together, the redefinition of latent weights as inertia and the introduction of Bop enable a better understanding of BNN optimization and open up the way for further improvements in training methodologies  ...  In this paper, we argue that these latent weights cannot be treated analogously to weights in real-valued networks. Instead their main role is to provide inertia during training.  ...  One way people currently translate knowledge from the real-valued network to the BNN is through initialization of the latent weights, which is becoming increasingly sophisticated [4, 8, 34] .  ... 
arXiv:1906.02107v2 fatcat:7fptrjarhnagna3ey6gy35kjc4
« Previous Showing results 1 — 15 out of 891 results