Filters








271 Hits in 6.4 sec

First-Pass Large Vocabulary Continuous Speech Recognition using Bi-Directional Recurrent DNNs [article]

Awni Y. Hannun, Andrew L. Maas, Daniel Jurafsky, Andrew Y. Ng
2014 arXiv   pre-print
We present a method to perform first-pass large vocabulary continuous speech recognition using only a neural network and language model.  ...  Experiments on the Wall Street Journal corpus demonstrate fairly competitive word error rates, and the importance of bi-directional network recurrence.  ...  Introduction Modern large vocabulary continuous speech recognition (LVCSR) systems are complex and difficult to modify.  ... 
arXiv:1408.2873v2 fatcat:kkllgsl74bd2tpsctefrlxk7z4

Dynamic Extension of ASR Lexicon Using Wikipedia Data

Badr Abdullah, Irina Illina, Dominique Fohr
2018 2018 IEEE Spoken Language Technology Workshop (SLT)  
Despite recent progress in developing Large Vocabulary Continuous Speech Recognition Systems (LVCSR), these systems suffer from Out-Of-Vocabulary words (OOV).  ...  These PNs are grouped in semantically similar classes using word embedding. We use a two-step approach: first, we select OOV PN pertinent classes with a multi-class Deep Neural Network (DNN).  ...  We perform a second pass speech recognition with the updated LVCSR system.  ... 
doi:10.1109/slt.2018.8639592 dblp:conf/slt/AbdullahIF18 fatcat:moviucldyjfqldg4g3dpavoda4

An Overview of End-to-End Automatic Speech Recognition

Dong Wang, Xiaodong Wang, Shaohe Lv
2019 Symmetry  
Automatic speech recognition, especially large vocabulary continuous speech recognition, is an important issue in the field of machine learning.  ...  But recently, HMM-deep neural network (DNN) model and the end-to-end model using deep learning has achieved performance beyond HMM-GMM. Both using deep learning techniques,  ...  These advantages make the end-to-end model quickly become a hot research direction in large vocabulary continuous speech recognition (LVCSR).  ... 
doi:10.3390/sym11081018 fatcat:ea3ohiy765clzbj7yulonvz7eu

Lexicon-Free Conversational Speech Recognition with Neural Networks

Andrew Maas, Ziang Xie, Dan Jurafsky, Andrew Ng
2015 Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies  
This approach eliminates much of the complex infrastructure of modern speech recognition systems, making it possible to directly train a speech recognizer using errors generated by spoken language understanding  ...  We present an approach to speech recognition that uses only a neural network to map acoustic input to characters, a character-level language model, and a beam search decoding procedure.  ...  Acknowledgments We thank Awni Hannun for his contributions to the software used for experiments in this work.  ... 
doi:10.3115/v1/n15-1038 dblp:conf/naacl/MaasXJN15 fatcat:w4ar6emg2fhgnbnlw77nxct5za

Segmental Recurrent Neural Networks for End-to-end Speech Recognition [article]

Liang Lu, Lingpeng Kong, Chris Dyer, Noah A. Smith, Steve Renals
2016 arXiv   pre-print
We achieved 17.3 phone error rate (PER) from the first-pass decoding --- the best reported result using CRFs, despite the fact that we only used a zeroth-order CRF and without using any language model.  ...  In this paper, we discuss practical training and decoding issues as well as the method to speed up the training in the context of speech recognition. We performed experiments on the TIMIT dataset.  ...  Using bi-directional RNNs is straightforward. x 1 x 2 x 3 x 4 y 2 y 1 x 5 x 6 y 3 Conditional Maximum Likelihood Training For speech recognition, the segmentation labels E are usually unknown, training  ... 
arXiv:1603.00223v2 fatcat:6mib4syijndq3ovn6mqficnwmi

Computational intelligence in processing of speech acoustics: a survey

Amitoj Singh, Navkiran Kaur, Vinay Kukreja, Virender Kadyan, Munish Kumar
2022 Complex & Intelligent Systems  
However, a limited number of automatic speech recognition systems are available for commercial use.  ...  DNN.  ...  , large vocabulary continuous speech recognition, and ASR systems.  ... 
doi:10.1007/s40747-022-00665-1 fatcat:6pu2xccbq5as7bn2y2tav2fdwa

Segmental Recurrent Neural Networks for End-to-End Speech Recognition

Liang Lu, Lingpeng Kong, Chris Dyer, Noah A. Smith, Steve Renals
2016 Interspeech 2016  
We achieved 17.3% phone error rate (PER) from the first-pass decoding -the best reported result using CRFs, despite the fact that we only used a zeroth-order CRF and without using any language model.  ...  In this paper, we discuss practical training and decoding issues as well as the method to speed up the training in the context of speech recognition. We performed experiments on the TIMIT dataset.  ...  Using bi-directional RNNs is straightforward. x 1 x 2 x 3 x 4 y 2 y 1 x 5 x 6 y 3 Conditional Maximum Likelihood Training For speech recognition, the segmentation labels E are usually unknown, training  ... 
doi:10.21437/interspeech.2016-40 dblp:conf/interspeech/LuKDSR16 fatcat:olzitewv5zcefivvhvtyftd2mu

RWTH ASR Systems for LibriSpeech: Hybrid vs Attention – w/o Data Augmentation [article]

Christoph Lüscher, Eugen Beck, Kazuki Irie, Markus Kitza, Wilfried Michel, Albert Zeyer, Ralf Schlüter, Hermann Ney
2019 arXiv   pre-print
We present state-of-the-art automatic speech recognition (ASR) systems employing a standard hybrid DNN/HMM architecture compared to an attention-based encoder-decoder design for the LibriSpeech task.  ...  Both hybrid DNN/HMM and attention-based systems employ bi-directional LSTMs for acoustic modeling/encoding. For language modeling, we employ both LSTM and Transformer based architectures.  ...  The work reflects only the authors' views and none of the funding parties is responsible for any use that may be made of the information it contains.  ... 
arXiv:1905.03072v2 fatcat:7ztem2scujgxfea5a66hmto4sa

The ICSTM+TUM+UP Approach to the 3rd CHIME Challenge: Single-Channel LSTM Speech Enhancement with Multi-Channel Correlation Shaping Dereverberation and LSTM Language Models [article]

Amr El-Desoky Mousa, Erik Marchi, Björn Schuller
2015 arXiv   pre-print
Our system uses Bidirectional Long Short-Term Memory (BLSTM) Recurrent Neural Networks (RNNs) for Single-channel Speech Enhancement (SSE).  ...  The first is the Phase-Error based Filtering (PEF) that uses time-varying phase-error filters based on estimated time-difference of arrival of the speech source and the phases of the microphone signals  ...  Time-frequency blocks with large PE are scaled down in amplitude, whereas, blocks with low PE are preserved. First, the PE is computed from the two phase spectra.  ... 
arXiv:1510.00268v1 fatcat:ufslzkzzcbcbrblagde2viv33i

RWTH ASR Systems for LibriSpeech: Hybrid vs Attention

Christoph Lüscher, Eugen Beck, Kazuki Irie, Markus Kitza, Wilfried Michel, Albert Zeyer, Ralf Schlüter, Hermann Ney
2019 Interspeech 2019  
We present state-of-the-art automatic speech recognition (ASR) systems employing a standard hybrid DNN/HMM architecture compared to an attention-based encoder-decoder design for the LibriSpeech task.  ...  Both hybrid DNN/HMM and attentionbased systems employ bi-directional LSTMs for acoustic modeling/encoding. For language modeling, we employ both LSTM and Transformer based architectures.  ...  The work reflects only the authors' views and none of the funding parties is responsible for any use that may be made of the information it contains.  ... 
doi:10.21437/interspeech.2019-1780 dblp:conf/interspeech/LuscherBIKMZSN19 fatcat:qrbwgptdjbd4bb2hcdyvc6igz4

Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition

G. E. Dahl, Dong Yu, Li Deng, A. Acero
2012 IEEE Transactions on Audio, Speech, and Language Processing  
We propose a novel context-dependent (CD) model for large-vocabulary speech recognition (LVSR) that leverages recent advances in using deep belief networks for phone recognition.  ...  Index Terms-Artificial neural network-hidden Markov model (ANN-HMM), context-dependent phone, deep belief network, deep neural network hidden Markov model (DNN-HMM), speech recognition, large-vocabulary  ...  Liu of the Microsoft Corporation speech product team for his assistance in getting the discriminatively-trained CD-GMM-HMM baselines, Dr. J.  ... 
doi:10.1109/tasl.2011.2134090 fatcat:b34qzq3bbzhodduhl5bt4vzune

Comparison of Hidden Markov Model and Recurrent Neural Network in Automatic Speech Recognition

Akshay Madhav Deshmukh
2020 European Journal of Engineering Research and Science  
The idea of using DNNs for Automatic Speech Recognition has gone further from being a single component in a pipeline to building a system mainly based on such a network.This paper provides a literature  ...  Understanding human speech precisely by a machine has been a major challenge for many years.With Automatic Speech Recognition (ASR) being decades old and considering the advancement of the technology,  ...  The Character Error Rate and Word Error Rate results from a BDRNN (Bi-Directional Recurrent Deep Neural Networks) systems are depicted.  ... 
doi:10.24018/ejers.2020.5.8.2077 fatcat:yxf3da6jyzcoxcjftcsytvytbq

Deep Learning: Methods and Applications

Li Deng
2014 Foundations and Trends® in Signal Processing  
large vocabulary speech recognition.  ...  from phone recognition to large vocabulary speech recognition.  ... 
doi:10.1561/2000000039 fatcat:vucffxhse5gfhgvt5zphgshjy4

Combining Residual Networks with LSTMs for Lipreading [article]

Themos Stafylakis, Georgios Tzimiropoulos
2017 arXiv   pre-print
We propose an end-to-end deep learning architecture for word-level visual speech recognition.  ...  The proposed network attains word accuracy equal to 83.0, yielding 6.8 absolute improvement over the current state-of-the-art, without using information about word boundaries during training or testing  ...  The former approach is considered more pertinent to tasks like isolated word recognition, classification and detection, while the latter to sentence-level classification and large vocabulary continuous  ... 
arXiv:1703.04105v4 fatcat:3nrrs4ndfjbzlfvliijg5isoya

EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding

Yajie Miao, Mohammad Gowayyed, Florian Metze
2015 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)  
The performance of automatic speech recognition (ASR) has improved tremendously due to the application of deep neural networks (DNNs).  ...  Acoustic modeling in Eesen involves learning a single recurrent neural network (RNN) predicting contextindependent targets (phonemes or characters).  ...  In the hybrid HMM/DNN approach, DNNs are used to classify speech frames into clustered context-dependent (CD) states (i.e., senones).  ... 
doi:10.1109/asru.2015.7404790 dblp:conf/asru/MiaoGM15 fatcat:yoqr5idcnjhifidhsq4aw3c5dq
« Previous Showing results 1 — 15 out of 271 results