A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Context-aware RNNLM Rescoring for Conversational Speech Recognition
[article]
2020
arXiv
pre-print
Conversational speech recognition is regarded as a challenging task due to its free-style speaking and long-term contextual dependencies. ...
To further take advantage of the persisted nature during a conversation, such as topics or speaker turn, we extend the rescoring procedure to a new context-aware manner. ...
Conclusions In this work, we propose a context-aware lattice rescoring methods for RNNLMs to capture topic effects and long-distance triggers for conversational speech recognition. ...
arXiv:2011.09301v1
fatcat:pjvxrmgfbncndevvfhx2t62pbe
Improving English Conversational Telephone Speech Recognition
2016
Interspeech 2016
Index Terms: conversational telephone speech recognition, deep neural networks, recurrent neural networks is added to the CE criterion for penalizing parameters deviation from the source model. ...
The goal of this work is to build a state-of-the-art English conversational telephone speech recognition system. ...
Introduction English conversational telephone speech (CTS) recognition systems are becoming better and better each year. ...
doi:10.21437/interspeech.2016-473
dblp:conf/interspeech/MedennikovPZ16
fatcat:fzdt6ues6zb6hbppuilxbrg5ue
The JHU Multi-Microphone Multi-Speaker ASR System for the CHiME-6 Challenge
[article]
2020
arXiv
pre-print
This paper summarizes the JHU team's efforts in tracks 1 and 2 of the CHiME-6 challenge for distant multi-microphone conversational speech diarization and recognition in everyday home environments. ...
activity detection, PLDA score fusion for diarization, and lattice combination for automatic speech recognition (ASR). ...
The challenge aims to improve speech recognition and speaker diarization for far-field conversational speech in challenging environments in a multimicrophone setting. ...
arXiv:2006.07898v1
fatcat:aetajvztqzbyfmtcdsit6ms72u
Neural Language Modeling with Implicit Cache Pointers
2020
Interspeech 2020
A cache-inspired approach is proposed for neural language models (LMs) to improve long-range dependency and better predict rare words from long contexts. ...
N -best rescoring experiments on Switchboard indicate that it benefits both very rare and frequent words. ...
Introduction Neural language models (LMs) are an important module in automatic speech recognition (ASR) [1, 2, 3] . ...
doi:10.21437/interspeech.2020-3020
dblp:conf/interspeech/LiPK20
fatcat:ruoyv7qs6vfw7bu6vmszusfpsi
Neural Language Modeling With Implicit Cache Pointers
[article]
2020
arXiv
pre-print
A cache-inspired approach is proposed for neural language models (LMs) to improve long-range dependency and better predict rare words from long contexts. ...
N-best rescoring experiments on Switchboard indicate that it benefits both very rare and frequent words. ...
Introduction Neural language models (LMs) are an important module in automatic speech recognition (ASR) [1, 2, 3] . ...
arXiv:2009.13774v1
fatcat:64dzobqqh5h4ffw27dxof47roy
Integrating meta-information into recurrent neural network language models
2015
Speech Communication
Due to their advantages over conventional n-gram language models, recurrent neural network language models (RNNLMs) recently have attracted a fair amount of research attention in the speech recognition ...
For the purposes of our investigation, we assume that information on the SSS can be captured at the moment at which speech is recorded. ...
WER is evaluated by carrying
490
out a rescoring experiment that takes as input the N-best list generated by the
491
speech recognition system. ...
doi:10.1016/j.specom.2015.06.006
fatcat:u57r2j3gd5c2vbiulfoot4auue
End-to-End Speech Recognition: A review for the French Language
[article]
2019
arXiv
pre-print
In this paper we propose a review of the existing end-to-end ASR approaches for the French language. ...
extra linguistic resources such as dictionaries or language models, is the capacity to model acoustic units such as characters, subwords or directly words; opening up the capacity to directly translate speech ...
Following lattice rescoring approach proposed in [31] , decoding was then performed with the RNNLM for all baseline systems. ...
arXiv:1910.08502v2
fatcat:yndio25lzvcw3ndeosrtbwnha4
Overview of the Interspeech TLT2020 Shared Task on ASR for Non-Native Children's Speech
2020
Interspeech 2020
We present an overview of the ASR challenge for non-native children's speech organized for a special session at Interspeech 2020. ...
The data for the challenge was obtained in the context of a spoken language proficiency assessment administered at Italian schools for students between the ages of 9 and 16 who were studying English and ...
model ensemble
CNN-TDNNF
3
18.71
kaldi
TDNN-BLSTM
n-gram, LM rescoring
lattice MBR combination
4
18.80
kaldi
TDNN
n-grams, RNNLM rescoring -
5
19.64
unknown
unknown
unknown
unknown ...
doi:10.21437/interspeech.2020-2133
dblp:conf/interspeech/GretterMFEL20
fatcat:g76pgsu57vac7by6ajhd6exrem
Semantic language models with deep neural networks
2016
Computer Speech and Language
The first one is automatic speech recognition (ASR) which recognizes what the user says. The second one is spoken language understanding (SLU) which understands what the user means. ...
LMs constrain the search space that is used in the search for the best hypothesis. Therefore, they play a crucial role in the performance of SLS. 6 1.4. STRUCTURE OF THE THESIS ...
The statistical speech recognition approach models the speech recognition problem as follows [68] . ...
doi:10.1016/j.csl.2016.04.001
fatcat:2ybfzvyavngkfn2rbtrbotnhc4
D2.1 Libraries and tools for multimodal content analysis
2018
Zenodo
This deliverable describes a joint collection of libraries and tools for multimodal content analysis created by the MeMAD project partners. ...
1 Acknowledgements Computational resources were provided by the Aalto Science-IT project and the CSC -IT Center for Science, Finland. ...
For the rescored models the amount of n-gram contexts depends on the level of segmentation, ranging from 50M contexts for the word model to 200M contexts for the character models. ...
doi:10.5281/zenodo.3697989
fatcat:bde5x3yggzb2jk2fh2mu6t5wxy
Acoustic-to-Word Recognition with Sequence-to-Sequence Models
[article]
2018
arXiv
pre-print
Acoustic-to-Word recognition provides a straightforward solution to end-to-end speech recognition without needing external decoding, language model re-scoring or lexicon. ...
We present effective methods to train Sequence-to-Sequence models for direct word-level recognition (and character-level recognition) and show an absolute improvement of 4.4-5.0\% in Word Error Rate on ...
We also thank the CMU speech group for many useful discussions. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X Pascal GPUs used for this research. ...
arXiv:1807.09597v2
fatcat:5yiise7hp5fafgyc3fvy4tmwj4
Convolutional Neural Networks for Raw Speech Recognition
[chapter]
2018
From Natural to Artificial Intelligence - Algorithms and Applications
State-of-the-art automatic speech recognition (ASR) systems map the speech signal into its corresponding text. Traditional ASR systems are based on Gaussian mixture model. ...
In this chapter, CNN-based acoustic model for raw speech signal is discussed. It establishes the relation between raw speech signal and phones in a data-driven manner. ...
The combination of RNNLMs and m-gram language model is generally used and it works on a rescoring technique. ...
doi:10.5772/intechopen.80026
fatcat:ni6csin5obgrpfdogpwgzjkphq
Automatic Speech Recognition: Systematic Literature Review
2021
IEEE Access
ACKNOWLEDGMENT The authors thank the Deanship of Scientific Research and RSSU at King Saud University for their technical support. ...
They used a pitchdependent acoustic mismatch in the context of children's speech recognition on adults' speech-trained models. ...
proposed a context-sensitive candidate label approach to smooth the training of recurrent neural network language models (RNNLMs), and it enhanced the ASR performance. ...
doi:10.1109/access.2021.3112535
fatcat:uhyhmyd6b5d2lldkhf6tihnxky
Computational intelligence in processing of speech acoustics: a survey
2022
Complex & Intelligent Systems
This paper examined major challenges for speech recognition for different languages. ...
An immense number of frameworks are available for speech processing and recognition for languages persisting around the globe. ...
Dutta K, Sarma KK (2012) Multiple feature extraction for RNNbased assamese speech recognition for speech to text conversion application. ...
doi:10.1007/s40747-022-00665-1
fatcat:6pu2xccbq5as7bn2y2tav2fdwa
Adaptation Algorithms for Speech Recognition: An Overview
[article]
2020
arXiv
pre-print
We present a structured overview of adaptation algorithms for neural network-based speech recognition, considering both hybrid hidden Markov model / neural network systems and end-to-end neural network ...
We present a meta-analysis of the performance of speech recognition adaptation algorithms, based on relative error rate reductions as reported in the literature. ...
The closest analogies to this in speech recognition are some of the domain recognition approaches discussed in Sec. XI and for multilingual speech recognition. ...
arXiv:2008.06580v1
fatcat:7cukuwdfjvdtpdnxb6gdmfywbu
« Previous
Showing results 1 — 15 out of 21 results