Filters








3,938 Hits in 6.1 sec

An Exploration of Dropout with RNNs for Natural Language Inference [article]

Amit Gajbhiye, Sardar Jaf, Noura Al Moubayed, A. Stephen McGough, Steven Bradley
2018 pre-print
Dropout is a crucial regularization technique for the Recurrent Neural Network (RNN) models of Natural Language Inference (NLI).  ...  In this paper, we propose a novel RNN model for NLI and empirically evaluate the effect of applying dropout at different layers in the model.  ...  The inherent complexities and ambiguities in natural language text makes NLU challenging for computers. The task of Natural Language Inference (NLI) is a fundamental step towards NLU [14] .  ... 
doi:10.1007/978-3-030-01424-7_16 arXiv:1810.08606v1 fatcat:6jpryt5xgjdvvb3sdflyp7hjju

Dropout during inference as a model for neurological degeneration in an image captioning network [article]

Bai Li, Ran Zhang, Frank Rudzicz
2018 arXiv   pre-print
We evaluate the effects of dropout on language production by measuring the KL-divergence of word frequency distributions and other linguistic metrics as dropout is added.  ...  We find that the generated sentences most closely approximate the word frequency distribution of the training corpus when using a moderate dropout of 0.4 during inference.  ...  To our knowledge, dropout during inference in RNNs has not been studied.  ... 
arXiv:1808.03747v1 fatcat:dnvqy53mmbfpjn2avj6tr7lcaa

Quasi-Recurrent Neural Networks [article]

James Bradbury, Stephen Merity, Caiming Xiong, Richard Socher
2016 arXiv   pre-print
Recurrent neural networks are a powerful tool for modeling sequential data, but the dependence of each timestep's computation on the previous timestep's output limits parallelism and makes RNNs unwieldy  ...  Experiments on language modeling, sentiment classification, and character-level neural machine translation demonstrate these advantages and underline the viability of QRNNs as a basic building block for  ...  RELATED WORK Exploring alternatives to traditional RNNs for sequence tasks is a major area of current research.  ... 
arXiv:1611.01576v2 fatcat:26zrh2glqfgqpol7qhd5d4ptja

Self-Attention Aligner: A Latency-Control End-to-End Model for ASR Using Self-Attention Network and Chunk-Hopping [article]

Linhao Dong, Feng Wang, Bo Xu
2019 arXiv   pre-print
Self-attention network, an attention-based feedforward neural network, has recently shown the potential to replace recurrent neural networks (RNNs) in a variety of NLP tasks.  ...  After jointly training with a self-attention network language model, our SAA model obtains further error rate reduction on multiple datasets.  ...  Besides, the sequential nature of RNNs leads to low parallelization and slow computation speed.  ... 
arXiv:1902.06450v1 fatcat:xujp65wgivhsbn2p65clvgmtcm

Evaluating the Effectiveness of Efficient Neural Architecture Search for Sentence-Pair Tasks [article]

Ansel MacLaughlin, Jwala Dhamala, Anoop Kumar, Sriram Venkatapathy, Ragav Venkatesan, Rahul Gupta
2020 arXiv   pre-print
on variety of natural language processing and computer vision tasks, including language modeling, natural language inference, and image classification.  ...  We use ENAS to perform a micro-level search and learn a task-optimized RNN cell architecture as a drop-in replacement for an LSTM.  ...  in layer 1 and an LSTM in layer 2, and finally 200 trials of HPT for the configuration with an LSTM in layer 1 and an ENAS-RNN in layer 2.  ... 
arXiv:2010.04249v1 fatcat:matlqpdepvaplgiy5xmmk3buwu

Exploring Bayesian Deep Learning for Urgent Instructor Intervention Need in MOOC Forums [article]

Jialin Yu, Laila Alrajhi, Anoushka Harit, Zhongtian Sun, Alexandra I. Cristea, Lei Shi
2021 arXiv   pre-print
In this paper, we explore for the first time Bayesian deep learning on learner-based text posts with two methods: Monte Carlo Dropout and Variational Inference, as a new solution to assessing the need  ...  This problem has been studied as a Natural Language Processing (NLP) problem recently, and is known to be challenging, due to the imbalance of the data and the complex nature of the task.  ...  The most widely adopted approximation method is the Monte Carlo Dropout [12] , with applications in natural language processing, data analytics and computer vision [13] [22] [23] [42] [40] .  ... 
arXiv:2104.12643v1 fatcat:s6t6rwg45bdlfndrxerpojjzue

Neural Tree Indexers for Text Understanding

Tsendsuren Munkhdalai, Hong Yu
2017 Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers  
We implemented and evaluated a binarytree model of NTI, showing the model achieved the state-of-the-art performance on three different NLP tasks: natural language inference, answer sentence selection,  ...  In contrast, the advantages of recursive networks include that they explicitly model the compositionality and the recursive structure of natural language.  ...  Acknowledgments We would like to thank the anonymous reviewers for their insightful comments and suggestions.  ... 
doi:10.18653/v1/e17-1002 dblp:conf/eacl/YuM17 fatcat:ljh6vi3w3nepbli326656qav7a

Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss [article]

Qian Zhang, Han Lu, Hasim Sak, Anshuman Tripathi, Erik McDermott, Stephen Koo, Shankar Kumar
2020 arXiv   pre-print
This is similar to the Recurrent Neural Network Transducer (RNN-T) model, which uses RNNs for information encoding instead of Transformer encoders.  ...  The model is trained with the RNN-T loss well-suited to streaming decoding.  ...  In this paper, we explore the possibility of replacing RNN-based audio and label encoders in the conventional RNN-T architecture with Transformer encoders.  ... 
arXiv:2002.02562v2 fatcat:n7zabn3rubav7ce4nb5y6puez4

Bayesian Recurrent Neural Networks [article]

Meire Fortunato, Charles Blundell, Oriol Vinyals
2019 arXiv   pre-print
We also empirically demonstrate how Bayesian RNNs are superior to traditional RNNs on a language modelling benchmark and an image captioning task, as well as showing how each of these methods improve our  ...  In this work we explore a straightforward variational Bayes scheme for Recurrent Neural Networks.  ...  Rezende, James Kirkpatrick, Alex Graves, Jacob Menick, Yori Zwols, Frederick Besse and many others at DeepMind for insightful discussions and feedback on this work.  ... 
arXiv:1704.02798v4 fatcat:ac452clc2bfd3ogyyu2csuiw7m

Neural Tree Indexers for Text Understanding [article]

Tsendsuren Munkhdalai, Hong Yu
2017 arXiv   pre-print
We implemented and evaluated a binarytree model of NTI, showing the model achieved the state-of-the-art performance on three different NLP tasks: natural language inference, answer sentence selection,  ...  In contrast, the advantages of recursive networks include that they explicitly model the compositionality and the recursive structure of natural language.  ...  Acknowledgments We would like to thank the anonymous reviewers for their insightful comments and suggestions.  ... 
arXiv:1607.04492v2 fatcat:wywffwfaonbrvhevuu2xsgb3le

Neural Tree Indexers for Text Understanding

Tsendsuren Munkhdalai, Hong Yu
2017 Association for Computational Linguistics (ACL). Annual Meeting Conference Proceedings  
We implemented and evaluated a binary-tree model of NTI, showing the model achieved the state-of-the-art performance on three different NLP tasks: natural language inference, answer sentence selection,  ...  In contrast, the advantages of recursive networks include that they explicitly model the compositionality and the recursive structure of natural language.  ...  Acknowledgments We would like to thank the anonymous reviewers for their insightful comments and suggestions.  ... 
pmid:29081577 pmcid:PMC5657441 fatcat:lbfx6koqhjbxnd4qp7zmleachy

Shortcut-Stacked Sentence Encoders for Multi-Domain Inference [article]

Yixin Nie, Mohit Bansal
2017 arXiv   pre-print
We present a simple sequential sentence encoder for multi-domain natural language inference.  ...  Our encoder is based on stacked bidirectional LSTM-RNNs with shortcut connections and fine-tuning of word embeddings.  ...  This work was partially supported by a Google Faculty Research Award, an IBM Faculty Award, a Bloomberg Data Science Research Grant, and NVidia GPU awards.  ... 
arXiv:1708.02312v2 fatcat:nc7uqnx3gzatpndqmikrcqosuy

Representation learning for neural population activity with Neural Data Transformers [article]

Joel Ye, Chethan Pandarinath
2021 arXiv   pre-print
This structure can be accurately captured using state space models with explicit dynamics, such as those based on recurrent neural networks (RNNs).  ...  Further, its non-recurrence enables 3.9ms inference, well within the loop time of real-time applications and more than 6 times faster than recurrent baselines on the monkey reaching dataset.  ...  Measurements were taken on a machine (on CPU) with 32GB RAM and a 4-core i7-4790K processor running at 4.2 GHz. training for Natural Language Generation, Translation, and Comprehension.  ... 
arXiv:2108.01210v1 fatcat:no7ooofgqfbjtlqewirzvfp6we

Shortcut-Stacked Sentence Encoders for Multi-Domain Inference

Yixin Nie, Mohit Bansal
2017 Proceedings of the 2nd Workshop on Evaluating Vector Space Representations for NLP  
We present a simple sequential sentence encoder for multi-domain natural language inference.  ...  Our encoder is based on stacked bidirectional LSTM-RNNs with shortcut connections and fine-tuning of word embeddings.  ...  This work was partially supported by a Google Faculty Research Award, an IBM Faculty Award, a Bloomberg Data Science Research Grant, and NVidia GPU awards.  ... 
doi:10.18653/v1/w17-5308 dblp:conf/repeval/NieB17 fatcat:gozf4rgpsrgwrpbnfnoianjdga

Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations [article]

David Krueger, Tegan Maharaj, János Kramár, Mohammad Pezeshki, Nicolas Ballas, Nan Rosemary Ke, Anirudh Goyal, Yoshua Bengio, Aaron Courville, Chris Pal
2017 arXiv   pre-print
We perform an empirical investigation of various RNN regularizers, and find that zoneout gives significant performance improvements across tasks.  ...  We achieve competitive results with relatively simple models in character- and word-level language modelling on the Penn Treebank and Text8 datasets, and combining with recurrent batch normalization yields  ...  We also thank IBM and Samsung for their support. We would also like to acknowledge the work of Pranav Shyam on learning RNN hierarchies.  ... 
arXiv:1606.01305v4 fatcat:v5rzre4s6vautfndckcsk3u2d4
« Previous Showing results 1 — 15 out of 3,938 results