Filters








30 Hits in 3.6 sec

Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations [article]

David Krueger, Tegan Maharaj, János Kramár, Mohammad Pezeshki, Nicolas Ballas, Nan Rosemary Ke, Anirudh Goyal, Yoshua Bengio, Aaron Courville, Chris Pal
2017 arXiv   pre-print
We propose zoneout, a novel method for regularizing RNNs. At each timestep, zoneout stochastically forces some hidden units to maintain their previous values.  ...  But by preserving instead of dropping hidden units, gradient information and state information are more readily propagated through time, as in feedforward stochastic depth networks.  ...  We would also like to acknowledge the work of Pranav Shyam on learning RNN hierarchies.  ... 
arXiv:1606.01305v4 fatcat:v5rzre4s6vautfndckcsk3u2d4

Predicting Taxi Destination by Regularized RNN with SDZ

Lei ZHANG, Guoxing ZHANG, Zhizheng LIANG, Qingfu FAN, Yadong LI
2018 IEICE transactions on information and systems  
In order to improve the prediction accuracy of taxi destination and reduce the training time, we embed suprisal-driven zoneout (SDZ) to RNN, hence a taxi destination prediction method by regularized RNN  ...  We adopt a Recurrent Neural Network (RNN) to explore the long-term dependencies to predict the taxi destination as the multiple hidden layers of RNN can store these dependencies.  ...  Acknowledgements This work was supported by the Fundamental Research Funds for the Central Universities (2017XKQY078).  ... 
doi:10.1587/transinf.2018edl8009 fatcat:tujxepyskvbfhox6dgyrj3saoy

Regularizing RNNs for Caption Generation by Reconstructing The Past with The Present [article]

Xinpeng Chen and Lin Ma and Wenhao Jiang and Jian Yao and Wei Liu
2018 arXiv   pre-print
Therefore, ARNet encourages the current hidden state to embed more information from the previous one, which can help regularize the transition dynamics of recurrent neural networks (RNNs).  ...  Furthermore, the performance on permuted sequential MNIST demonstrates that ARNet can effectively regularize RNN, especially on modeling long-term dependencies.  ...  Zoneout regularizes RNNs by randomly preserving hidden activations, which stochastically forces some parts of hidden unit and memory cell to maintain their previous values at each time step.  ... 
arXiv:1803.11439v2 fatcat:4tao76xsobcprcbu5kmbhnank4

Regularizing RNNs for Caption Generation by Reconstructing the Past with the Present

Xinpeng Chen, Lin Ma, Wenhao Jiang, Jian Yao, Wei Liu
2018 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition  
Therefore, ARNet encourages the current hidden state to embed more information from the previous one, which can help regularize the transition dynamics of recurrent neural networks (RNNs).  ...  Furthermore, the performance on permuted sequential MNIST demonstrates that ARNet can effectively regularize RNN, especially on modeling long-term dependencies.  ...  Zoneout regularizes RNNs by randomly preserving hidden activations, which stochastically forces some parts of hidden unit and memory cell to maintain their previous values at each time step.  ... 
doi:10.1109/cvpr.2018.00834 dblp:conf/cvpr/Chen0JY018 fatcat:7mporxrrtfa45jckbiv7oujp2q

Shifting Mean Activation Towards Zero with Bipolar Activation Functions [article]

Lars Eidnes, Arild Nøkland
2018 arXiv   pre-print
We explore the training of deep vanilla recurrent neural networks (RNNs) with up to 144 layers, and show that bipolar activation functions help learning in this setting.  ...  We propose a simple extension to the ReLU-family of activation functions that allows them to shift the mean activation across a layer towards zero.  ...  Since good performance on this dataset is highly dependent on regularization, in order to get a fair comparison of various depths and activation functions, we need to find good regularization parameters  ... 
arXiv:1709.04054v3 fatcat:77c6wfdymfasplvq2dvqs6j45y

Revisiting Activation Regularization for Language RNNs [article]

Stephen Merity, Bryan McCann, Richard Socher
2017 arXiv   pre-print
We revisit traditional regularization techniques, specifically L2 regularization on RNN activations and slowness regularization over successive hidden states, to improve the performance of RNNs on the  ...  Both of these techniques require minimal modification to existing RNN architectures and result in performance improvements comparable or superior to more complicated regularization techniques or custom  ...  Zoneout (Krueger et al., 2016) prevents hidden state updates from occurring by setting a randomly selected subset of network unit activations in h t+1 to be equal to the previous activations from h t  ... 
arXiv:1708.01009v1 fatcat:wk6vg7enmjharp4jvkknr6skwq

Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks

Aaron Voelker, Ivana Kajic, Chris Eliasmith
2019 Neural Information Processing Systems  
dependencies spanning 100,000 time-steps, converge rapidly, and use few internal state-variables to learn complex functions spanning long windows of time -exceeding state-of-the-art performance among RNNs  ...  The Legendre Memory Unit (LMU) is mathematically derived to orthogonalize its continuous-time history -doing so by solving d coupled ordinary differential equations (ODEs), whose phase space linearly maps  ...  Acknowledgments We thank the reviewers for improving our work by identifying areas in need of clarification and suggesting additional points of validation.  ... 
dblp:conf/nips/VoelkerKE19 fatcat:dcl2n6xrvrhsbptiucaxgi5d2e

Flipout: Efficient Pseudo-Independent Weight Perturbations on Mini-Batches [article]

Yeming Wen, Paul Vicol, Jimmy Ba, Dustin Tran, Roger Grosse
2018 arXiv   pre-print
Empirically, flipout achieves the ideal linear variance reduction for fully connected networks, convolutional networks, and RNNs.  ...  We show that flipout is effective at regularizing LSTMs, and outperforms previous methods.  ...  ACKNOWLEDGMENTS YW was supported by an NSERC USRA award, and PV was supported by a Connaught New Researcher Award. We thank David Duvenaud, Alex Graves, Geoffrey Hinton, and Matthew D.  ... 
arXiv:1803.04386v2 fatcat:xfuj62jlirdqpk4emdfgixynnu

Survey of Dropout Methods for Deep Neural Networks [article]

Alex Labach, Hojjat Salehinejad, Shahrokh Valaee
2019 arXiv   pre-print
They have been successfully applied in neural network regularization, model compression, and in measuring the uncertainty of neural network outputs.  ...  memory in RNNs is Zoneout [23] .  ...  This is done by only applying dropout to the part of the RNN that updates the hidden state and not the state itself.  ... 
arXiv:1904.13310v2 fatcat:psmwelybpfb7jo57pnafgl6ale

A Survey on Dropout Methods and Experimental Verification in Recommendation [article]

Yangkun Li, Weizhi Ma, Chong Chen, Min Zhang, Yiqun Liu, Shaoping Ma, Yuekui Yang
2022 arXiv   pre-print
From randomly dropping neurons to dropping neural structures, dropout has achieved great success in improving model performances.  ...  various dropout methods have been designed and widely applied in past years, their effectiveness, application scenarios, and contributions have not been comprehensively summarized and empirically compared by  ...  For RNNs, early applications of dropout [14] only drop feedforward connections, in order to preserve the memory ability of RNN.  ... 
arXiv:2204.02027v2 fatcat:js3i2laehvbcjekzskzu4uvxf4

Recent Advances in Recurrent Neural Networks [article]

Hojjat Salehinejad, Sharan Sankar, Joseph Barfett, Errol Colak, Shahrokh Valaee
2018 arXiv   pre-print
A well-trained RNN can model any dynamical system; however, training RNNs is mostly plagued by issues in learning long-term dependencies.  ...  The RNNs have a stack of non-linear units where at least one connection between units forms a directed cycle.  ...  Hidden Activation Preservation The zoneout method is a very special case of dropout. It forces some units to keep their activation from the previous timestep (i.e., h t = h t−1 ) [85] .  ... 
arXiv:1801.01078v3 fatcat:ioxziqbkmzdrfoh2kukul6xlku

Rotational Unit of Memory [article]

Rumen Dangovski and Li Jing and Marin Soljacic
2017 arXiv   pre-print
The core of RUM is its rotational operation, which is, naturally, a unitary matrix, providing architectures with the power to learn long-term dependencies by overcoming the vanishing and exploding gradients  ...  However, RNN still have a limited capacity to manipulate long-term memory. To bypass this weakness the most successful applications of RNN use external techniques such as attention mechanisms.  ...  This work was partially supported by the Army  ... 
arXiv:1710.09537v1 fatcat:efsi4uk7kzhgpg5a3uu5a55cna

Rotational Unit of Memory: A Novel Representation Unit for RNNs with Scalable Applications

Rumen Dangovski, Li Jing, Preslav Nakov, Mićo Tatalović, Marin Soljačić
2019 Transactions of the Association for Computational Linguistics  
We further demonstrate that by replacing LSTM/GRU with RUM units we can apply neural networks to real-world problems such as language modeling and text summarization, yielding results comparable to the  ...  We show experimentally that RNNs based on RUMs can solve basic sequential tasks such as memory copying and memory recall much better than LSTMs/GRUs.  ...  The cell zoneout/hidden zoneout/dropout probability is 0.5/0.9/0.35 for FS-RUM-2, and 0.5/0.1/0.65 for the vanilla versions. We train for 100 epochs with a 0.002 learning rate.  ... 
doi:10.1162/tacl_a_00258 fatcat:r4mfweh4aneitd76fbnx6mefv4

Recurrent Batch Normalization [article]

Tim Cooijmans, Nicolas Ballas, César Laurent, Çağlar Gülçehre, Aaron Courville
2017 arXiv   pre-print
Whereas previous works only apply batch normalization to the input-to-hidden transformation of RNNs, we demonstrate that it is both possible and beneficial to batch-normalize the hidden-to-hidden transition  ...  the mean and standard deviation of the normalized activation, and ∈ R is a regularization hyperparameter.  ...  The training sequence does not cleanly divide by 100, so for each epoch we randomly crop a subsequence that does and segment that instead.  ... 
arXiv:1603.09025v5 fatcat:eyradinuvrfxbpo3cclkyi3vhy

Recurrent Memory Array Structures [article]

Kamil Rocki
2016 arXiv   pre-print
The following report introduces ideas augmenting standard Long Short Term Memory (LSTM) architecture with multiple memory cells per hidden unit in order to improve its generalization capabilities.  ...  Acknoledgements This work has been supported in part by the Defense Advanced Research Projects Agency (DARPA).  ...  , including Recurrent Dropout (Zaremba et al., 2014) is caused by the fact that all or almost all hidden states' activations can be zeros.  ... 
arXiv:1607.03085v3 fatcat:c7m2dn7lbjba5j76o73vvtaoli
« Previous Showing results 1 — 15 out of 30 results