220 Hits in 5.8 sec

Lattice Rescoring Strategies for Long Short Term Memory Language Models in Speech Recognition [article]

Shankar Kumar, Michael Nirschl, Daniel Holtmann-Rice, Hank Liao, Ananda Theertha Suresh, Felix Yu
2017 arXiv   pre-print
Recurrent neural network (RNN) language models (LMs) and Long Short Term Memory (LSTM) LMs, a variant of RNN LMs, have been shown to outperform traditional N-gram LMs on speech recognition tasks.  ...  In this paper, we evaluate existing lattice rescoring algorithms along with new variants on a YouTube speech recognition task.  ...  ACKNOWLEDGEMENTS We thank Jitong Chen, Michael Riley, Brian Roark, Hagen Soltau, David Rybach, Ciprian Chelba, Chris Alberti, Felix Stahlberg and Rafal Jozefowicz for helpful suggestions.  ... 
arXiv:1711.05448v1 fatcat:27iy5iox4fahhhegc6xj4n5onm

NN-Grams: Unifying Neural Network and n-Gram Language Models for Speech Recognition

Babak Damavandi, Shankar Kumar, Noam Shazeer, Antoine Bruguier
2016 Interspeech 2016  
We present NN-grams, a novel, hybrid language model integrating n-grams and neural networks (NN) for speech recognition. The model takes as input both word histories as well as n-gram counts.  ...  We present results with noise samples derived from either an n-gram distribution or from speech recognition lattices.  ...  (RNNs) [7] and variants such as long short term memory (LSTM) networks [8] have started outperforming n-gram models [9] .  ... 
doi:10.21437/interspeech.2016-1295 dblp:conf/interspeech/DamavandiKSB16 fatcat:zwyyyu66vbfynnl4ihqdyur2qy

NN-grams: Unifying neural network and n-gram language models for Speech Recognition [article]

Babak Damavandi, Shankar Kumar, Noam Shazeer, Antoine Bruguier
2016 arXiv   pre-print
We present NN-grams, a novel, hybrid language model integrating n-grams and neural networks (NN) for speech recognition. The model takes as input both word histories as well as n-gram counts.  ...  We present results with noise samples derived from either an n-gram distribution or from speech recognition lattices.  ...  Acknowledgements We would like to thank Kaisuke Nakajima, Xuedong Zhang, Francoise Beaufays, Chris Alberti and Rafal Jozefowicz for providing crucial support at various stages of this project.  ... 
arXiv:1606.07470v1 fatcat:44c4cgasx5ardgw373hsfsf2pm

Context-aware RNNLM Rescoring for Conversational Speech Recognition [article]

Kun Wei, Pengcheng Guo, Hang Lv, Zhen Tu, Lei Xie
2020 arXiv   pre-print
Conversational speech recognition is regarded as a challenging task due to its free-style speaking and long-term contextual dependencies.  ...  Prior work has explored the modeling of long-range context through RNNLM rescoring with improved performance.  ...  In [20] , Sundermeyer et al. combined the previous lattice decoding work with long short-term memory (LSTM) neural network language models and also investigated a refined pruning technique.  ... 
arXiv:2011.09301v1 fatcat:pjvxrmgfbncndevvfhx2t62pbe

The GTM-UVIGO System for Albayzin 2018 Speech-to-Text Evaluation

Laura Docío-Fernández, Carmen García-Mateo
2018 IberSPEECH 2018  
It uses an hybrid Deep Neural Network -Hidden Markov Model (DNN-HMM) for acoustic modeling, and a rescoring of a trigram based wordlattices, obtained in a first decoding stage, with a fourgram language  ...  model or a language model based on a recurrent neural network.  ...  In more recent work, recurrent network topologies such as LSTM (Long Short-Term Memory) [7] have also been applied [8] [9] [10] .  ... 
doi:10.21437/iberspeech.2018-58 dblp:conf/iberspeech/FernandezG18 fatcat:57ndbad2aje3lpv6ata73hetoe

Improving the Automatic Speech Recognition through the improvement of Laguage Models

Andrés Piñeiro-Martín, Carmen García-Mateo, Laura Docío-Fernández
2018 IberSPEECH 2018  
Language models are one of the pillars on which the performance of automatic speech recognition systems are based.  ...  Experimental results showed that improving the quality of language models yields improvements in recognition performance.  ...  Our gratitude to the Ramon Piñeiro Institute of the Xunta de Galicia for allowing the use of the CORGA material and for its collaboration in the labeling of the second and third corpora. References  ... 
doi:10.21437/iberspeech.2018-8 dblp:conf/iberspeech/MartinGF18 fatcat:5nf7atju7va7tac7xkcwuinfgy

Exploiting deep neural networks for detection-based speech recognition

Sabato Marco Siniscalchi, Dong Yu, Li Deng, Chin-Hui Lee
2013 Neurocomputing  
This improved phoneme prediction accuracy, when integrated into a standard large vocabulary continuous speech recognition (LVCSR) system through a word lattice rescoring framework, results in improved  ...  word recognition accuracy, which is better than previously reported word lattice rescoring results.  ...  The rescoring formulation for word lattices is as follows: each arc in a lattice corresponds to a word in a string hypothesis.  ... 
doi:10.1016/j.neucom.2012.11.008 fatcat:hm25jzvd5jgmpdmr4e45c7hyly

Improving N-Best Rescoring in Under-Resourced Code-Switched Speech Recognition Using Pretraining and Data Augmentation

Joshua Jansen van Vüren, Thomas Niesler
2022 Languages  
In this study, we present improvements in N-best rescoring of code-switched speech achieved by n-gram augmentation as well as optimised pretraining of long short-term memory (LSTM) language models with  ...  We conclude that the careful optimisation of the pretraining strategy used for neural network language models can offer worthwhile improvements in speech recognition accuracy even at language switches,  ...  Acknowledgments: We would like to thank the South African Centre for High Performance Computing (CHPC) for providing computational resources on their Lengau cluster for this research.  ... 
doi:10.3390/languages7030236 fatcat:yvnokeuckzeprjzphtrlg5sypi

The I2R's ASR System for the VOiCES from a Distance Challenge 2019

Tze Yuang Chong, Kye Min Tan, Kah Kuan Teh, Chang Huai You, Hanwu Sun, Huy Dat Tran
2019 Interspeech 2019  
Moreover, an LSTM language model was used to rescore the lattice to compensate the weak n-gram model trained from only the transcription text.  ...  This paper describes the development of the automatic speech recognition (ASR) system for the submission to the VOiCES from a Distance Challenge 2019.  ...  LSTM Language Model Rescoring For language modeling, an LSTM language model [31, 32, 33] was used to rescore the lattice produced by the ASR systems.  ... 
doi:10.21437/interspeech.2019-2130 dblp:conf/interspeech/ChongTTYSD19 fatcat:gk7z6qzonzecde4lncttq3ycfm

Noise-Robust ASR for the third 'CHiME' Challenge Exploiting Time-Frequency Masking based Multi-Channel Speech Enhancement and Recurrent Neural Network [article]

Zaihu Pang, Fengyun Zhu
2015 arXiv   pre-print
The state-of-the-art speech recognition techniques, namely recurrent neural network based acoustic and language modeling, state space minimum Bayes risk based discriminative acoustic modeling, and i-vector  ...  In this paper, the Lingban entry to the third 'CHiME' speech separation and recognition challenge is presented.  ...  ACKNOWLEDGMENT The authors would like to thank Zhiping Zhang, Xiangang Li, Yi Liu and Tong Fu for their kindly helps.  ... 
arXiv:1509.07211v1 fatcat:5ysnjlvclzd3tifjaulwssmutq

Recurrent neural network-based language modeling for an automatic Russian speech recognition system

Irina Kipyatkova, Alexey Karpov
2015 2015 Artificial Intelligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference (AINL-ISMW FRUCT)  
In the paper, we describe a research of recurrent neural network language models for N-best list rescoring for automatic continuous Russian speech recognition.  ...  We tried recurrent neural networks with different number of units in the hidden layer. We achieved the relative word error rate reduction of 14% with respect to the baseline 3-gram model.  ...  Short-Term Memory NN LM implementation.  ... 
doi:10.1109/ainl-ismw-fruct.2015.7382966 fatcat:uhtstyiuprfjrfvpp37yj3sr5m

Combining Frame-Synchronous and Label-Synchronous Systems for Speech Recognition [article]

Qiujia Li, Chao Zhang, Philip C. Woodland
2021 arXiv   pre-print
In this paper, we propose rescoring the N-best hypotheses or lattices produced by a first-pass frame-synchronous system with a label-synchronous system in a second-pass.  ...  Commonly used automatic speech recognition (ASR) systems can be classified into frame-synchronous and label-synchronous categories, based on whether the speech is decoded on a per-frame or per-label basis  ...  a stack of RNN layers, such as long short-term memory To remove the independence assumption across output (LSTM) layers [6], [8], Transformer encoder blocks [22], and tokens in  ... 
arXiv:2107.00764v1 fatcat:htcvumzh3rd2vkdj2o4jqd3lna

BART based semantic correction for Mandarin automatic speech recognition system [article]

Yun Zhao, Xuerui Yang, Jinchao Wang, Yongyu Gao, Chao Yan, Yuanfu Zhou
2021 arXiv   pre-print
Although automatic speech recognition (ASR) systems achieved significantly improvements in recent years, spoken language recognition error occurs which can be easily spotted by human beings.  ...  Various language modeling techniques have been developed on post recognition tasks like semantic correction.  ...  In addition to our baseline model, we trained an another acoustic model using 10% of full speech data with short epochs for SC training data generation. 2. Introduce error to acoustic features.  ... 
arXiv:2104.05507v1 fatcat:jcrpxdg2rnc35avgbekg6jkcjm

Acoustic and Textual Data Augmentation for Improved ASR of Code-Switching Speech

Emre Yılmaz, Henk van den Heuvel, David van Leeuwen
2018 Interspeech 2018  
In this paper, we describe several techniques for improving the acoustic and language model of an automatic speech recognition (ASR) system operating on code-switching (CS) speech.  ...  In previous work, we have proposed several automatic transcription strategies for CS speech to increase the amount of available training speech data.  ...  CS text is generated either by training long short-term memory (LSTM) language models on the very small amount of CS text extracted from the transcriptions of the training speech data and synthesize much  ... 
doi:10.21437/interspeech.2018-52 dblp:conf/interspeech/YilmazHL18 fatcat:cawnnsaerje5jgiceyln5myjee

Application of LSTM Neural Networks in Language Modelling [chapter]

Daniel Soutner, Luděk Müller
2013 Lecture Notes in Computer Science  
Due the difficulties in training of RNN, the way could be in using Long Short Term Memory (LSTM) neural network architecture.  ...  Artificial neural networks have become state-of-the-art in the task of language modelling on a small corpora.  ...  Evaluating on speech recognition (1000-best list rescore). Conclusions We have applied the LSTM neural network language model to spontaneous Czech speech.  ... 
doi:10.1007/978-3-642-40585-3_14 fatcat:zix7zl5owfdkxixkd52iyrfy5i
« Previous Showing results 1 — 15 out of 220 results