A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations
[article]
2017
arXiv
pre-print
We propose zoneout, a novel method for regularizing RNNs. At each timestep, zoneout stochastically forces some hidden units to maintain their previous values. ...
But by preserving instead of dropping hidden units, gradient information and state information are more readily propagated through time, as in feedforward stochastic depth networks. ...
We would also like to acknowledge the work of Pranav Shyam on learning RNN hierarchies. ...
arXiv:1606.01305v4
fatcat:v5rzre4s6vautfndckcsk3u2d4
Predicting Taxi Destination by Regularized RNN with SDZ
2018
IEICE transactions on information and systems
In order to improve the prediction accuracy of taxi destination and reduce the training time, we embed suprisal-driven zoneout (SDZ) to RNN, hence a taxi destination prediction method by regularized RNN ...
We adopt a Recurrent Neural Network (RNN) to explore the long-term dependencies to predict the taxi destination as the multiple hidden layers of RNN can store these dependencies. ...
Acknowledgements This work was supported by the Fundamental Research Funds for the Central Universities (2017XKQY078). ...
doi:10.1587/transinf.2018edl8009
fatcat:tujxepyskvbfhox6dgyrj3saoy
Regularizing RNNs for Caption Generation by Reconstructing The Past with The Present
[article]
2018
arXiv
pre-print
Therefore, ARNet encourages the current hidden state to embed more information from the previous one, which can help regularize the transition dynamics of recurrent neural networks (RNNs). ...
Furthermore, the performance on permuted sequential MNIST demonstrates that ARNet can effectively regularize RNN, especially on modeling long-term dependencies. ...
Zoneout regularizes RNNs by randomly preserving hidden activations, which stochastically forces some parts of hidden unit and memory cell to maintain their previous values at each time step. ...
arXiv:1803.11439v2
fatcat:4tao76xsobcprcbu5kmbhnank4
Regularizing RNNs for Caption Generation by Reconstructing the Past with the Present
2018
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
Therefore, ARNet encourages the current hidden state to embed more information from the previous one, which can help regularize the transition dynamics of recurrent neural networks (RNNs). ...
Furthermore, the performance on permuted sequential MNIST demonstrates that ARNet can effectively regularize RNN, especially on modeling long-term dependencies. ...
Zoneout regularizes RNNs by randomly preserving hidden activations, which stochastically forces some parts of hidden unit and memory cell to maintain their previous values at each time step. ...
doi:10.1109/cvpr.2018.00834
dblp:conf/cvpr/Chen0JY018
fatcat:7mporxrrtfa45jckbiv7oujp2q
Shifting Mean Activation Towards Zero with Bipolar Activation Functions
[article]
2018
arXiv
pre-print
We explore the training of deep vanilla recurrent neural networks (RNNs) with up to 144 layers, and show that bipolar activation functions help learning in this setting. ...
We propose a simple extension to the ReLU-family of activation functions that allows them to shift the mean activation across a layer towards zero. ...
Since good performance on this dataset is highly dependent on regularization, in order to get a fair comparison of various depths and activation functions, we need to find good regularization parameters ...
arXiv:1709.04054v3
fatcat:77c6wfdymfasplvq2dvqs6j45y
Revisiting Activation Regularization for Language RNNs
[article]
2017
arXiv
pre-print
We revisit traditional regularization techniques, specifically L2 regularization on RNN activations and slowness regularization over successive hidden states, to improve the performance of RNNs on the ...
Both of these techniques require minimal modification to existing RNN architectures and result in performance improvements comparable or superior to more complicated regularization techniques or custom ...
Zoneout (Krueger et al., 2016) prevents hidden state updates from occurring by setting a randomly selected subset of network unit activations in h t+1 to be equal to the previous activations from h t ...
arXiv:1708.01009v1
fatcat:wk6vg7enmjharp4jvkknr6skwq
Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks
2019
Neural Information Processing Systems
dependencies spanning 100,000 time-steps, converge rapidly, and use few internal state-variables to learn complex functions spanning long windows of time -exceeding state-of-the-art performance among RNNs ...
The Legendre Memory Unit (LMU) is mathematically derived to orthogonalize its continuous-time history -doing so by solving d coupled ordinary differential equations (ODEs), whose phase space linearly maps ...
Acknowledgments We thank the reviewers for improving our work by identifying areas in need of clarification and suggesting additional points of validation. ...
dblp:conf/nips/VoelkerKE19
fatcat:dcl2n6xrvrhsbptiucaxgi5d2e
Flipout: Efficient Pseudo-Independent Weight Perturbations on Mini-Batches
[article]
2018
arXiv
pre-print
Empirically, flipout achieves the ideal linear variance reduction for fully connected networks, convolutional networks, and RNNs. ...
We show that flipout is effective at regularizing LSTMs, and outperforms previous methods. ...
ACKNOWLEDGMENTS YW was supported by an NSERC USRA award, and PV was supported by a Connaught New Researcher Award. We thank David Duvenaud, Alex Graves, Geoffrey Hinton, and Matthew D. ...
arXiv:1803.04386v2
fatcat:xfuj62jlirdqpk4emdfgixynnu
Survey of Dropout Methods for Deep Neural Networks
[article]
2019
arXiv
pre-print
They have been successfully applied in neural network regularization, model compression, and in measuring the uncertainty of neural network outputs. ...
memory in RNNs is Zoneout [23] . ...
This is done by only applying dropout to the part of the RNN that updates the hidden state and not the state itself. ...
arXiv:1904.13310v2
fatcat:psmwelybpfb7jo57pnafgl6ale
A Survey on Dropout Methods and Experimental Verification in Recommendation
[article]
2022
arXiv
pre-print
From randomly dropping neurons to dropping neural structures, dropout has achieved great success in improving model performances. ...
various dropout methods have been designed and widely applied in past years, their effectiveness, application scenarios, and contributions have not been comprehensively summarized and empirically compared by ...
For RNNs, early applications of dropout [14] only drop feedforward connections, in order to preserve the memory ability of RNN. ...
arXiv:2204.02027v2
fatcat:js3i2laehvbcjekzskzu4uvxf4
Recent Advances in Recurrent Neural Networks
[article]
2018
arXiv
pre-print
A well-trained RNN can model any dynamical system; however, training RNNs is mostly plagued by issues in learning long-term dependencies. ...
The RNNs have a stack of non-linear units where at least one connection between units forms a directed cycle. ...
Hidden Activation Preservation The zoneout method is a very special case of dropout. It forces some units to keep their activation from the previous timestep (i.e., h t = h t−1 ) [85] . ...
arXiv:1801.01078v3
fatcat:ioxziqbkmzdrfoh2kukul6xlku
Rotational Unit of Memory
[article]
2017
arXiv
pre-print
The core of RUM is its rotational operation, which is, naturally, a unitary matrix, providing architectures with the power to learn long-term dependencies by overcoming the vanishing and exploding gradients ...
However, RNN still have a limited capacity to manipulate long-term memory. To bypass this weakness the most successful applications of RNN use external techniques such as attention mechanisms. ...
This work was partially supported by the Army ...
arXiv:1710.09537v1
fatcat:efsi4uk7kzhgpg5a3uu5a55cna
Rotational Unit of Memory: A Novel Representation Unit for RNNs with Scalable Applications
2019
Transactions of the Association for Computational Linguistics
We further demonstrate that by replacing LSTM/GRU with RUM units we can apply neural networks to real-world problems such as language modeling and text summarization, yielding results comparable to the ...
We show experimentally that RNNs based on RUMs can solve basic sequential tasks such as memory copying and memory recall much better than LSTMs/GRUs. ...
The cell zoneout/hidden zoneout/dropout probability is 0.5/0.9/0.35 for FS-RUM-2, and 0.5/0.1/0.65 for the vanilla versions. We train for 100 epochs with a 0.002 learning rate. ...
doi:10.1162/tacl_a_00258
fatcat:r4mfweh4aneitd76fbnx6mefv4
Recurrent Batch Normalization
[article]
2017
arXiv
pre-print
Whereas previous works only apply batch normalization to the input-to-hidden transformation of RNNs, we demonstrate that it is both possible and beneficial to batch-normalize the hidden-to-hidden transition ...
the mean and standard deviation of the normalized activation, and ∈ R is a regularization hyperparameter. ...
The training sequence does not cleanly divide by 100, so for each epoch we randomly crop a subsequence that does and segment that instead. ...
arXiv:1603.09025v5
fatcat:eyradinuvrfxbpo3cclkyi3vhy
Recurrent Memory Array Structures
[article]
2016
arXiv
pre-print
The following report introduces ideas augmenting standard Long Short Term Memory (LSTM) architecture with multiple memory cells per hidden unit in order to improve its generalization capabilities. ...
Acknoledgements This work has been supported in part by the Defense Advanced Research Projects Agency (DARPA). ...
, including Recurrent Dropout (Zaremba et al., 2014) is caused by the fact that all or almost all hidden states' activations can be zeros. ...
arXiv:1607.03085v3
fatcat:c7m2dn7lbjba5j76o73vvtaoli
« Previous
Showing results 1 — 15 out of 30 results