Recurrent Batch Normalization [article]

Tim Cooijmans, Nicolas Ballas, César Laurent, Çağlar Gülçehre, Aaron Courville
2017 arXiv   pre-print
We propose a reparameterization of LSTM that brings the benefits of batch normalization to recurrent neural networks. Whereas previous works only apply batch normalization to the input-to-hidden transformation of RNNs, we demonstrate that it is both possible and beneficial to batch-normalize the hidden-to-hidden transition, thereby reducing internal covariate shift between time steps. We evaluate our proposal on various sequential problems such as sequence classification, language modeling and
more » ... uestion answering. Our empirical results show that our batch-normalized LSTM consistently leads to faster convergence and improved generalization.
arXiv:1603.09025v5 fatcat:eyradinuvrfxbpo3cclkyi3vhy