A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is application/pdf
.
Filters
An Exploration of Dropout with RNNs for Natural Language Inference
[article]
2018
pre-print
Dropout is a crucial regularization technique for the Recurrent Neural Network (RNN) models of Natural Language Inference (NLI). ...
In this paper, we propose a novel RNN model for NLI and empirically evaluate the effect of applying dropout at different layers in the model. ...
The inherent complexities and ambiguities in natural language text makes NLU challenging for computers. The task of Natural Language Inference (NLI) is a fundamental step towards NLU [14] . ...
doi:10.1007/978-3-030-01424-7_16
arXiv:1810.08606v1
fatcat:6jpryt5xgjdvvb3sdflyp7hjju
Dropout during inference as a model for neurological degeneration in an image captioning network
[article]
2018
arXiv
pre-print
We evaluate the effects of dropout on language production by measuring the KL-divergence of word frequency distributions and other linguistic metrics as dropout is added. ...
We find that the generated sentences most closely approximate the word frequency distribution of the training corpus when using a moderate dropout of 0.4 during inference. ...
To our knowledge, dropout during inference in RNNs has not been studied. ...
arXiv:1808.03747v1
fatcat:dnvqy53mmbfpjn2avj6tr7lcaa
Quasi-Recurrent Neural Networks
[article]
2016
arXiv
pre-print
Recurrent neural networks are a powerful tool for modeling sequential data, but the dependence of each timestep's computation on the previous timestep's output limits parallelism and makes RNNs unwieldy ...
Experiments on language modeling, sentiment classification, and character-level neural machine translation demonstrate these advantages and underline the viability of QRNNs as a basic building block for ...
RELATED WORK Exploring alternatives to traditional RNNs for sequence tasks is a major area of current research. ...
arXiv:1611.01576v2
fatcat:26zrh2glqfgqpol7qhd5d4ptja
Self-Attention Aligner: A Latency-Control End-to-End Model for ASR Using Self-Attention Network and Chunk-Hopping
[article]
2019
arXiv
pre-print
Self-attention network, an attention-based feedforward neural network, has recently shown the potential to replace recurrent neural networks (RNNs) in a variety of NLP tasks. ...
After jointly training with a self-attention network language model, our SAA model obtains further error rate reduction on multiple datasets. ...
Besides, the sequential nature of RNNs leads to low parallelization and slow computation speed. ...
arXiv:1902.06450v1
fatcat:xujp65wgivhsbn2p65clvgmtcm
Evaluating the Effectiveness of Efficient Neural Architecture Search for Sentence-Pair Tasks
[article]
2020
arXiv
pre-print
on variety of natural language processing and computer vision tasks, including language modeling, natural language inference, and image classification. ...
We use ENAS to perform a micro-level search and learn a task-optimized RNN cell architecture as a drop-in replacement for an LSTM. ...
in layer 1 and an LSTM in layer 2, and finally 200 trials of HPT for the configuration with an LSTM in layer 1 and an ENAS-RNN in layer 2. ...
arXiv:2010.04249v1
fatcat:matlqpdepvaplgiy5xmmk3buwu
Exploring Bayesian Deep Learning for Urgent Instructor Intervention Need in MOOC Forums
[article]
2021
arXiv
pre-print
In this paper, we explore for the first time Bayesian deep learning on learner-based text posts with two methods: Monte Carlo Dropout and Variational Inference, as a new solution to assessing the need ...
This problem has been studied as a Natural Language Processing (NLP) problem recently, and is known to be challenging, due to the imbalance of the data and the complex nature of the task. ...
The most widely adopted approximation method is the Monte Carlo Dropout [12] , with applications in natural language processing, data analytics and computer vision [13] [22] [23] [42] [40] . ...
arXiv:2104.12643v1
fatcat:s6t6rwg45bdlfndrxerpojjzue
Neural Tree Indexers for Text Understanding
2017
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers
We implemented and evaluated a binarytree model of NTI, showing the model achieved the state-of-the-art performance on three different NLP tasks: natural language inference, answer sentence selection, ...
In contrast, the advantages of recursive networks include that they explicitly model the compositionality and the recursive structure of natural language. ...
Acknowledgments We would like to thank the anonymous reviewers for their insightful comments and suggestions. ...
doi:10.18653/v1/e17-1002
dblp:conf/eacl/YuM17
fatcat:ljh6vi3w3nepbli326656qav7a
Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss
[article]
2020
arXiv
pre-print
This is similar to the Recurrent Neural Network Transducer (RNN-T) model, which uses RNNs for information encoding instead of Transformer encoders. ...
The model is trained with the RNN-T loss well-suited to streaming decoding. ...
In this paper, we explore the possibility of replacing RNN-based audio and label encoders in the conventional RNN-T architecture with Transformer encoders. ...
arXiv:2002.02562v2
fatcat:n7zabn3rubav7ce4nb5y6puez4
Bayesian Recurrent Neural Networks
[article]
2019
arXiv
pre-print
We also empirically demonstrate how Bayesian RNNs are superior to traditional RNNs on a language modelling benchmark and an image captioning task, as well as showing how each of these methods improve our ...
In this work we explore a straightforward variational Bayes scheme for Recurrent Neural Networks. ...
Rezende, James Kirkpatrick, Alex Graves, Jacob Menick, Yori Zwols, Frederick Besse and many others at DeepMind for insightful discussions and feedback on this work. ...
arXiv:1704.02798v4
fatcat:ac452clc2bfd3ogyyu2csuiw7m
Neural Tree Indexers for Text Understanding
[article]
2017
arXiv
pre-print
We implemented and evaluated a binarytree model of NTI, showing the model achieved the state-of-the-art performance on three different NLP tasks: natural language inference, answer sentence selection, ...
In contrast, the advantages of recursive networks include that they explicitly model the compositionality and the recursive structure of natural language. ...
Acknowledgments We would like to thank the anonymous reviewers for their insightful comments and suggestions. ...
arXiv:1607.04492v2
fatcat:wywffwfaonbrvhevuu2xsgb3le
Neural Tree Indexers for Text Understanding
2017
Association for Computational Linguistics (ACL). Annual Meeting Conference Proceedings
We implemented and evaluated a binary-tree model of NTI, showing the model achieved the state-of-the-art performance on three different NLP tasks: natural language inference, answer sentence selection, ...
In contrast, the advantages of recursive networks include that they explicitly model the compositionality and the recursive structure of natural language. ...
Acknowledgments We would like to thank the anonymous reviewers for their insightful comments and suggestions. ...
pmid:29081577
pmcid:PMC5657441
fatcat:lbfx6koqhjbxnd4qp7zmleachy
Shortcut-Stacked Sentence Encoders for Multi-Domain Inference
[article]
2017
arXiv
pre-print
We present a simple sequential sentence encoder for multi-domain natural language inference. ...
Our encoder is based on stacked bidirectional LSTM-RNNs with shortcut connections and fine-tuning of word embeddings. ...
This work was partially supported by a Google Faculty Research Award, an IBM Faculty Award, a Bloomberg Data Science Research Grant, and NVidia GPU awards. ...
arXiv:1708.02312v2
fatcat:nc7uqnx3gzatpndqmikrcqosuy
Representation learning for neural population activity with Neural Data Transformers
[article]
2021
arXiv
pre-print
This structure can be accurately captured using state space models with explicit dynamics, such as those based on recurrent neural networks (RNNs). ...
Further, its non-recurrence enables 3.9ms inference, well within the loop time of real-time applications and more than 6 times faster than recurrent baselines on the monkey reaching dataset. ...
Measurements were taken on a machine (on CPU) with 32GB RAM and a 4-core i7-4790K processor running at 4.2 GHz. training for Natural Language Generation, Translation, and Comprehension. ...
arXiv:2108.01210v1
fatcat:no7ooofgqfbjtlqewirzvfp6we
Shortcut-Stacked Sentence Encoders for Multi-Domain Inference
2017
Proceedings of the 2nd Workshop on Evaluating Vector Space Representations for NLP
We present a simple sequential sentence encoder for multi-domain natural language inference. ...
Our encoder is based on stacked bidirectional LSTM-RNNs with shortcut connections and fine-tuning of word embeddings. ...
This work was partially supported by a Google Faculty Research Award, an IBM Faculty Award, a Bloomberg Data Science Research Grant, and NVidia GPU awards. ...
doi:10.18653/v1/w17-5308
dblp:conf/repeval/NieB17
fatcat:gozf4rgpsrgwrpbnfnoianjdga
Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations
[article]
2017
arXiv
pre-print
We perform an empirical investigation of various RNN regularizers, and find that zoneout gives significant performance improvements across tasks. ...
We achieve competitive results with relatively simple models in character- and word-level language modelling on the Penn Treebank and Text8 datasets, and combining with recurrent batch normalization yields ...
We also thank IBM and Samsung for their support. We would also like to acknowledge the work of Pranav Shyam on learning RNN hierarchies. ...
arXiv:1606.01305v4
fatcat:v5rzre4s6vautfndckcsk3u2d4
« Previous
Showing results 1 — 15 out of 3,938 results