21 Hits in 2.3 sec

Context-aware RNNLM Rescoring for Conversational Speech Recognition [article]

Kun Wei, Pengcheng Guo, Hang Lv, Zhen Tu, Lei Xie
2020 arXiv   pre-print
Conversational speech recognition is regarded as a challenging task due to its free-style speaking and long-term contextual dependencies.  ...  To further take advantage of the persisted nature during a conversation, such as topics or speaker turn, we extend the rescoring procedure to a new context-aware manner.  ...  Conclusions In this work, we propose a context-aware lattice rescoring methods for RNNLMs to capture topic effects and long-distance triggers for conversational speech recognition.  ... 
arXiv:2011.09301v1 fatcat:pjvxrmgfbncndevvfhx2t62pbe

Improving English Conversational Telephone Speech Recognition

Ivan Medennikov, Alexey Prudnikov, Alexander Zatvornitskiy
2016 Interspeech 2016  
Index Terms: conversational telephone speech recognition, deep neural networks, recurrent neural networks is added to the CE criterion for penalizing parameters deviation from the source model.  ...  The goal of this work is to build a state-of-the-art English conversational telephone speech recognition system.  ...  Introduction English conversational telephone speech (CTS) recognition systems are becoming better and better each year.  ... 
doi:10.21437/interspeech.2016-473 dblp:conf/interspeech/MedennikovPZ16 fatcat:fzdt6ues6zb6hbppuilxbrg5ue

The JHU Multi-Microphone Multi-Speaker ASR System for the CHiME-6 Challenge [article]

Ashish Arora, Desh Raj, Aswin Shanmugam Subramanian, Ke Li, Bar Ben-Yair, Matthew Maciejewski, Piotr Żelasko, Paola García, Shinji Watanabe, Sanjeev Khudanpur
2020 arXiv   pre-print
This paper summarizes the JHU team's efforts in tracks 1 and 2 of the CHiME-6 challenge for distant multi-microphone conversational speech diarization and recognition in everyday home environments.  ...  activity detection, PLDA score fusion for diarization, and lattice combination for automatic speech recognition (ASR).  ...  The challenge aims to improve speech recognition and speaker diarization for far-field conversational speech in challenging environments in a multimicrophone setting.  ... 
arXiv:2006.07898v1 fatcat:aetajvztqzbyfmtcdsit6ms72u

Neural Language Modeling with Implicit Cache Pointers

Ke Li, Daniel Povey, Sanjeev Khudanpur
2020 Interspeech 2020  
A cache-inspired approach is proposed for neural language models (LMs) to improve long-range dependency and better predict rare words from long contexts.  ...  N -best rescoring experiments on Switchboard indicate that it benefits both very rare and frequent words.  ...  Introduction Neural language models (LMs) are an important module in automatic speech recognition (ASR) [1, 2, 3] .  ... 
doi:10.21437/interspeech.2020-3020 dblp:conf/interspeech/LiPK20 fatcat:ruoyv7qs6vfw7bu6vmszusfpsi

Neural Language Modeling With Implicit Cache Pointers [article]

Ke Li, Daniel Povey, Sanjeev Khudanpur
2020 arXiv   pre-print
A cache-inspired approach is proposed for neural language models (LMs) to improve long-range dependency and better predict rare words from long contexts.  ...  N-best rescoring experiments on Switchboard indicate that it benefits both very rare and frequent words.  ...  Introduction Neural language models (LMs) are an important module in automatic speech recognition (ASR) [1, 2, 3] .  ... 
arXiv:2009.13774v1 fatcat:64dzobqqh5h4ffw27dxof47roy

Integrating meta-information into recurrent neural network language models

Yangyang Shi, Martha Larson, Joris Pelemans, Catholijn M. Jonker, Patrick Wambacq, Pascal Wiggers, Kris Demuynck
2015 Speech Communication  
Due to their advantages over conventional n-gram language models, recurrent neural network language models (RNNLMs) recently have attracted a fair amount of research attention in the speech recognition  ...  For the purposes of our investigation, we assume that information on the SSS can be captured at the moment at which speech is recorded.  ...  WER is evaluated by carrying 490 out a rescoring experiment that takes as input the N-best list generated by the 491 speech recognition system.  ... 
doi:10.1016/j.specom.2015.06.006 fatcat:u57r2j3gd5c2vbiulfoot4auue

End-to-End Speech Recognition: A review for the French Language [article]

Florian Boyer, Jean-Luc Rouas
2019 arXiv   pre-print
In this paper we propose a review of the existing end-to-end ASR approaches for the French language.  ...  extra linguistic resources such as dictionaries or language models, is the capacity to model acoustic units such as characters, subwords or directly words; opening up the capacity to directly translate speech  ...  Following lattice rescoring approach proposed in [31] , decoding was then performed with the RNNLM for all baseline systems.  ... 
arXiv:1910.08502v2 fatcat:yndio25lzvcw3ndeosrtbwnha4

Overview of the Interspeech TLT2020 Shared Task on ASR for Non-Native Children's Speech

Roberto Gretter, Marco Matassoni, Daniele Falavigna, Keelan Evanini, Chee Wee Leong
2020 Interspeech 2020  
We present an overview of the ASR challenge for non-native children's speech organized for a special session at Interspeech 2020.  ...  The data for the challenge was obtained in the context of a spoken language proficiency assessment administered at Italian schools for students between the ages of 9 and 16 who were studying English and  ...  model ensemble CNN-TDNNF 3 18.71 kaldi TDNN-BLSTM n-gram, LM rescoring lattice MBR combination 4 18.80 kaldi TDNN n-grams, RNNLM rescoring - 5 19.64 unknown unknown unknown unknown  ... 
doi:10.21437/interspeech.2020-2133 dblp:conf/interspeech/GretterMFEL20 fatcat:g76pgsu57vac7by6ajhd6exrem

Semantic language models with deep neural networks

Ali Orkan Bayer, Giuseppe Riccardi
2016 Computer Speech and Language  
The first one is automatic speech recognition (ASR) which recognizes what the user says. The second one is spoken language understanding (SLU) which understands what the user means.  ...  LMs constrain the search space that is used in the search for the best hypothesis. Therefore, they play a crucial role in the performance of SLS. 6 1.4. STRUCTURE OF THE THESIS  ...  The statistical speech recognition approach models the speech recognition problem as follows [68] .  ... 
doi:10.1016/j.csl.2016.04.001 fatcat:2ybfzvyavngkfn2rbtrbotnhc4

D2.1 Libraries and tools for multimodal content analysis

Doukhan; David, Danny Francis, Benoit Huet, Sami Keronen, Mikko Kurimo, Jorma Laaksonen, Tiina Lindh-Knuutila, Bernard Merialdo, Mats Sjöberg, Umut Sulubacak, Jörg Tiedemann, Kim Viljanen
2018 Zenodo  
This deliverable describes a joint collection of libraries and tools for multimodal content analysis created by the MeMAD project partners.  ...  1 Acknowledgements Computational resources were provided by the Aalto Science-IT project and the CSC -IT Center for Science, Finland.  ...  For the rescored models the amount of n-gram contexts depends on the level of segmentation, ranging from 50M contexts for the word model to 200M contexts for the character models.  ... 
doi:10.5281/zenodo.3697989 fatcat:bde5x3yggzb2jk2fh2mu6t5wxy

Acoustic-to-Word Recognition with Sequence-to-Sequence Models [article]

Shruti Palaskar, Florian Metze
2018 arXiv   pre-print
Acoustic-to-Word recognition provides a straightforward solution to end-to-end speech recognition without needing external decoding, language model re-scoring or lexicon.  ...  We present effective methods to train Sequence-to-Sequence models for direct word-level recognition (and character-level recognition) and show an absolute improvement of 4.4-5.0\% in Word Error Rate on  ...  We also thank the CMU speech group for many useful discussions. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X Pascal GPUs used for this research.  ... 
arXiv:1807.09597v2 fatcat:5yiise7hp5fafgyc3fvy4tmwj4

Convolutional Neural Networks for Raw Speech Recognition [chapter]

Vishal Passricha, Rajesh Kumar Aggarwal
2018 From Natural to Artificial Intelligence - Algorithms and Applications  
State-of-the-art automatic speech recognition (ASR) systems map the speech signal into its corresponding text. Traditional ASR systems are based on Gaussian mixture model.  ...  In this chapter, CNN-based acoustic model for raw speech signal is discussed. It establishes the relation between raw speech signal and phones in a data-driven manner.  ...  The combination of RNNLMs and m-gram language model is generally used and it works on a rescoring technique.  ... 
doi:10.5772/intechopen.80026 fatcat:ni6csin5obgrpfdogpwgzjkphq

Automatic Speech Recognition: Systematic Literature Review

Sadeen Alharbi, Muna Alrazgan, Alanoud Alrashed, Turkiah AlNomasi, Raghad Almojel, Rimah Alharbi, Saja Alharbi, Sahar Alturki, Fatimah Alshehri, Maha Almojil
2021 IEEE Access  
ACKNOWLEDGMENT The authors thank the Deanship of Scientific Research and RSSU at King Saud University for their technical support.  ...  They used a pitchdependent acoustic mismatch in the context of children's speech recognition on adults' speech-trained models.  ...  proposed a context-sensitive candidate label approach to smooth the training of recurrent neural network language models (RNNLMs), and it enhanced the ASR performance.  ... 
doi:10.1109/access.2021.3112535 fatcat:uhyhmyd6b5d2lldkhf6tihnxky

Computational intelligence in processing of speech acoustics: a survey

Amitoj Singh, Navkiran Kaur, Vinay Kukreja, Virender Kadyan, Munish Kumar
2022 Complex & Intelligent Systems  
This paper examined major challenges for speech recognition for different languages.  ...  An immense number of frameworks are available for speech processing and recognition for languages persisting around the globe.  ...  Dutta K, Sarma KK (2012) Multiple feature extraction for RNNbased assamese speech recognition for speech to text conversion application.  ... 
doi:10.1007/s40747-022-00665-1 fatcat:6pu2xccbq5as7bn2y2tav2fdwa

Adaptation Algorithms for Speech Recognition: An Overview [article]

Peter Bell, Joachim Fainberg, Ondrej Klejch, Jinyu Li, Steve Renals, Pawel Swietojanski
2020 arXiv   pre-print
We present a structured overview of adaptation algorithms for neural network-based speech recognition, considering both hybrid hidden Markov model / neural network systems and end-to-end neural network  ...  We present a meta-analysis of the performance of speech recognition adaptation algorithms, based on relative error rate reductions as reported in the literature.  ...  The closest analogies to this in speech recognition are some of the domain recognition approaches discussed in Sec. XI and for multilingual speech recognition.  ... 
arXiv:2008.06580v1 fatcat:7cukuwdfjvdtpdnxb6gdmfywbu
« Previous Showing results 1 — 15 out of 21 results