369 Hits in 8.9 sec

Learning to Screen for Fast Softmax Inference on Large Vocabulary Neural Networks [article]

Patrick H. Chen, Si Si, Sanjiv Kumar, Yang Li, Cho-Jui Hsieh
2018 arXiv   pre-print
For example, for machine translation task on German to English dataset with around 25K vocabulary, we can achieve 20.4 times speed up with 98.9% precision@1 and 99.3% precision@5 with the original softmax  ...  Using the Gumbel softmax, we are able to train the screening model end-to-end on the training set to exploit data distribution.  ...  CONCLUSION In this paper, we proposed a new algorithm for fast softmax inference on large vocabulary neural language models.  ... 
arXiv:1810.12406v1 fatcat:kuc4hwgsdrblpntq26q24cdxuu

Stock Price Prediction Based on Natural Language Processing1

Xiaobin Tang, Nuo Lei, Manru Dong, Dan Ma, Atila Bueno
2022 Complexity  
This study designs a new text mining method for keywords augmentation based on natural language processing models including Bidirectional Encoder Representation from Transformers (BERT) and Neural Contextualized  ...  Therefore, the keywords augmentation model designed in this study is helpful to provide references for other variable expansion in financial time series forecasting.  ...  With the development of computer's computing power, the deep learning language model based on large-scale neural networks has been realized.  ... 
doi:10.1155/2022/9031900 fatcat:qstlgv5dnre2jbcird3t2l6pou

Fast End-to-End Speech Recognition via Non-Autoregressive Models and Cross-Modal Knowledge Transferring from BERT [article]

Ye Bai, Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Zhengqi Wen, Shuai Zhang
2021 arXiv   pre-print
Further, we propose a cross-modal transfer learning method to refine semantics from a large-scale pre-trained language model BERT for improving the performance.  ...  At last, the probability distribution on the vocabulary is computed for each token position. Therefore, speech recognition is re-formulated as a position-wise classification problem.  ...  The authors are grateful to the anonymous reviewers for their invaluable comments that improve the completeness and readability of this paper.  ... 
arXiv:2102.07594v6 fatcat:6rtjstjwb5bhxg7bdks5ik6qxe

Neural Networks for Text Correction and Completion in Keyboard Decoding [article]

Shaona Ghosh, Per Ola Kristensson
2017 arXiv   pre-print
Neural Network (RNN) and Convolutional Neural Networks (CNN) for natural language understanding.  ...  This paper proposes a sequence-to-sequence neural attention network system for automatic text correction and completion.  ...  To the best of our knowledge, this is the first research body of work on deep neural networks for correction and completion.  ... 
arXiv:1709.06429v1 fatcat:zxtcemr76jb33f5axqffe6dhqi

Deep Autoencoding Topic Model with Scalable Hybrid Bayesian Inference [article]

Hao Zhang, Bo Chen, Yulai Cong, Dandan Guo, Hongwei Liu, Mingyuan Zhou
2020 arXiv   pre-print
In order to provide scalable posterior inference for the parameters of the generative network, we develop topic-layer-adaptive stochastic gradient Riemannian MCMC that jointly learns simplex-constrained  ...  The efficacy and scalability of our models are demonstrated on both unsupervised and supervised learning tasks on big corpora.  ...  Teh, “A fast learning algorithm for deep belief nets,” Neural Computation, vol. 18, pp. 1527–1554, 2006. [48] M.  ... 
arXiv:2006.08804v1 fatcat:px4gousafnehtf3w55tzeohweu

Memory-based Parameter Adaptation [article]

Pablo Sprechmann, Siddhant M. Jayakumar, Jack W. Rae, Alexander Pritzel, Adrià Puigdomènech Badia, Benigno Uria, Oriol Vinyals, Demis Hassabis, Razvan Pascanu, Charles Blundell
2018 arXiv   pre-print
Deep neural networks have excelled on a wide range of problems, from vision to language and game playing.  ...  , and fast learning during evaluation.  ...  ACKNOWLEDGMENTS We would like to thank Gabor Melis for providing the LSTM baselines on the language tasks.  ... 
arXiv:1802.10542v1 fatcat:56h6yirgufabvixwp7cw5apgem

Sentiment Classification towards Question-Answering with Hierarchical Matching Network

Chenlin Shen, Changlong Sun, Jingjing Wang, Yangyang Kang, Shoushan Li, Xiaozhong Liu, Luo Si, Min Zhang, Guodong Zhou
2018 Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing  
On the basis, we propose a three-stage hierarchical matching network to explore deep sentiment information in a QA text pair.  ...  In this study, we propose a novel task/method to address QA sentiment analysis.  ...  Acknowledgments We would like to thank the anonymous reviewers for their valuable comments.  ... 
doi:10.18653/v1/d18-1401 dblp:conf/emnlp/ShenSWKLLSZZ18 fatcat:3e4qt7qiwvgrtm2dwlzkrsglyq

Attention Mechanism in Neural Networks: Where it Comes and Where it Goes [article]

Derya Soydaner
2022 arXiv   pre-print
The goal of this paper is to provide an overview from the early work on searching for ways to implement attention idea with neural networks until the recent trends.  ...  A long time ago in the machine learning literature, the idea of incorporating a mechanism inspired by the human visual system into neural networks was introduced.  ...  The idea of incorporating attention mechanisms into deep neural networks has led to state-of-the-art results for a large variety of tasks.  ... 
arXiv:2204.13154v1 fatcat:lziyvfr5gfgp5limgpm4cizgxq

On Architectures for Including Visual Information in Neural Language Models for Image Description [article]

Marc Tanti and Albert Gatt and Kenneth P. Camilleri
2019 arXiv   pre-print
We also observe that the merge architecture can have its recurrent neural network pre-trained in a text-only language model (transfer learning) rather than be initialised randomly as usual.  ...  We analyse these four architectures and conclude that the best performing one is init-inject, which is when the visual information is injected into the initial state of the recurrent neural network.  ...  The result of the softmax is a vector of probabilities that sum to 1, one for each class. An interesting feature of neural networks is their expressiveness.  ... 
arXiv:1911.03738v1 fatcat:ncbx3ee22nhmfjh4u6mierrrbu

News Text Classification Method Based on the GRU_CNN Model

Lujuan Deng, Qingxia Ge, Jiaxue Zhang, Zuhe Li, Zeqi Yu, Tiantian Yin, Hanxue Zhu, Raghavan Dhanasekaran
2022 International Transactions on Electrical Energy Systems  
To address this problem, this paper proposes a news text classification method based on the GRU_CNN model, which combines the advantages of CNN and GRU.  ...  The convolutional neural network can extract local features of text but cannot capture structure information or semantic relationships between words, and a single CNN model's classification performance  ...  Among text categorization models based on deep learning, the convolutional neural network is a shallow neural network model with good performance and fast training speed.  ... 
doi:10.1155/2022/1197534 fatcat:xk4bzpparjacnpo2mdtqoeudb4

Myanmar named entity corpus and its use in syllable-based neural named entity recognition

Hsu Myat Mo, Khin Mar Soe
2020 International Journal of Electrical and Computer Engineering (IJECE)  
This work also aims to discover the effectiveness of neural network approaches to textual processing for Myanmar language as well as to promote future research works on this understudied language.  ...  This work also contributes the first evaluation of various deep neural network architectures on Myanmar Named Entity Recognition.  ...  than softmax in inference layer.  ... 
doi:10.11591/ijece.v10i2.pp1544-1551 fatcat:ijmmsb7qnrfffmzetnnlgpcb7q

Transfer Learning for Context-Aware Spoken Language Understanding

Qian Chen, Zhu Zhuo, Wen Wang, Qiuyun Xu
2019 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)  
We explore different transfer learning approaches to reduce dependency on data collection and annotation.  ...  In addition to unsupervised pre-training using large-scale general purpose unlabeled corpora, such as Wikipedia, we explore unsupervised and supervised adaptive training approaches for transfer learning  ...  The learned WordPiece embeddings [23] are used to alleviate the out-of-vocabulary (OOV) problem. The learned position embeddings are used to capture the sequence order information.  ... 
doi:10.1109/asru46091.2019.9003902 dblp:conf/asru/ChenZWX19 fatcat:uotw5f7efrgbfh7jg7t67363gu

The Poisson Gamma Belief Network [article]

Mingyuan Zhou, Yulai Cong, Bo Chen
2015 arXiv   pre-print
To infer a multilayer representation of high-dimensional count vectors, we propose the Poisson gamma belief network (PGBN) that factorizes each of its layers into the product of a connection weight matrix  ...  Example results on text analysis illustrate interesting relationships between the width of the first layer and the inferred network structure, and demonstrate that the PGBN, whose hidden units are imposed  ...  Note that K1 max = 800 is large enough to cover all active first-layer topics (inferred to be around 500 for both binary classification tasks), whereas all the first-layer topics would be used if K1 max  ... 
arXiv:1511.02199v2 fatcat:ut77saydlzharefb4mnhc7ks54

Extreme Classification (Dagstuhl Seminar 18291)

Samy Bengio, Krzysztof Dembczynski, Thorsten Joachims, Marius Kloft, Manik Varma, Michael Wagner
2019 Dagstuhl Reports  
Extreme classification is a rapidly growing research area within machine learning focusing on multi-class and multi-label problems involving an extremely large number of labels (even more than a million  ...  Extreme classification has also opened up a new paradigm for key industrial applications such as ranking and recommendation by reformulating them as multi-label learning tasks where each item to be ranked  ...  Nowadays, this conditional probability is usually estimated using neural networks, which implies computing a softmax over the full vocabulary.  ... 
doi:10.4230/dagrep.8.7.62 dblp:journals/dagstuhl-reports/BengioDJKV18 fatcat:tglxen4d4vc5vkxtllzy3xokl4

Neural Machine Translation [article]

Philipp Koehn
2017 arXiv   pre-print
Draft of textbook chapter on neural machine translation. a comprehensive treatment of the topic, ranging from introduction to neural networks, computation graphs, description of the currently dominant  ...  Written as chapter for the textbook Statistical Machine Translation. Used in the JHU Fall 2017 class on machine translation.  ...  On the other hand, neural methods are not well equipped to deal with such large vocabularies. The ideal representations for neural networks are continuous space vectors.  ... 
arXiv:1709.07809v1 fatcat:kj23sup7yfaxvllfha4v7xbugq
« Previous Showing results 1 — 15 out of 369 results