A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
Meta-Learning for improving rare word recognition in end-to-end ASR
[article]
2021
arXiv
pre-print
of combining their outcomes into an end-to-end automatic speech recognition system to improve rare word recognition. ...
We propose a new method of generating meaningful embeddings for speech, changes to four commonly used meta learning approaches to enable them to perform keyword spotting in continuous signals and an approach ...
INTRODUCTION While end-to-end (E2E) [1] deep learning (DL) models brought great improvements to the field of automatic speech recognition (ASR) in recent years and reduced word error rates (WER) on benchmark ...
arXiv:2102.12624v1
fatcat:25roojpldfe3paqfxdifelq4wi
Multi-task Language Modeling for Improving Speech Recognition of Rare Words
[article]
2021
arXiv
pre-print
Our best ASR system with multi-task LM shows 4.6% WERR deduction compared with RNN Transducer only ASR baseline for rare words recognition. ...
In this paper, we propose a second-pass system with multi-task learning, utilizing semantic targets (such as intent and slot prediction) to improve speech recognition performance. ...
We find that the improvement in WER is more pronounced for rare words, likely due to improvements in recognition of slot content. ...
arXiv:2011.11715v4
fatcat:wwlt4dvw75hvnh6vhxvcw4lngm
Monolingual Data Selection Analysis for English-Mandarin Hybrid Code-switching Speech Recognition
[article]
2020
arXiv
pre-print
In this paper, we conduct data selection analysis in building an English-Mandarin code-switching (CS) speech recognition (CSSR) system, which is aimed for a real CSSR contest in China. ...
Then to exploit monolingual data, we find data matching is crucial. Mandarin data is closely matched with the Mandarin part in the code-switching data, while English data is not. ...
Acknowledgements The computational work for this paper is partially performed on the resources of the National Supercomputing Centre (NSCC), Singapore (https://www.nscc.sg).
References ...
arXiv:2006.07094v2
fatcat:g5pql34bozdhxgaj4e76jphyp4
Monolingual Data Selection Analysis for English-Mandarin Hybrid Code-Switching Speech Recognition
2020
Interspeech 2020
In this paper, we conduct data selection analysis in building an English-Mandarin code-switching (CS) speech recognition (CSSR) system, which is aimed for a real CSSR contest in China. ...
The CSSR system can perform within-utterance code-switch recognition, but it still has a margin with the one trained on code-switching data. 1 Here, data selection simply means how to reasonably exploit ...
This has been extensively studied under the End-to-end (E2E) ASR framework [23] . ...
doi:10.21437/interspeech.2020-1582
dblp:conf/interspeech/ZhangXPHC20
fatcat:oqtng33rr5cczaaoz3qndphxpq
Context-Aware Dialog Re-Ranking for Task-Oriented Dialog Systems
[article]
2018
arXiv
pre-print
By using neural word embedding-based models and handcrafted or logistic regression-based ensemble models, we have improved the performance of a recently proposed end-to-end task-oriented dialog system ...
Furthermore, no previous studies have analyzed whether response ranking can improve the performance of existing dialog systems in real human-computer dialogs with speech recognition errors. ...
However, NN is not effective for ASR-Task 6 since it is quite rare for exactly the same pair to be found in the training dialog. ...
arXiv:1811.11430v1
fatcat:dky5fm4bkfh5pcas25kfo3e63u
Mitigating the Impact of Speech Recognition Errors on Chatbot using Sequence-to-Sequence Model
[article]
2017
arXiv
pre-print
We apply sequence-to-sequence model to mitigate the impact of speech recognition errors on open domain end-to-end dialog generation. ...
The method shows that the sequence-to-sequence model can learn the ASR transcriptions and original text pair having the same meaning and eliminate the speech recognition errors. ...
While abounding works focusing on spoken language understanding has hastened ASR failure management in modular dialog systems, ASR error handling in end-to-end chatbots is rarely seen. ...
arXiv:1709.07862v2
fatcat:klvr5w4iynd5jazba4t2sm65ii
Instant One-Shot Word-Learning for Context-Specific Neural Sequence-to-Sequence Speech Recognition
[article]
2021
arXiv
pre-print
Neural sequence-to-sequence systems deliver state-of-the-art performance for automatic speech recognition (ASR). ...
To alleviate this problem we supplement an end-to-end ASR system with a word/phrase memory and a mechanism to access this memory to recognize the words and phrases correctly. ...
In order to solve this problem, in this paper we extend an end-to-end ASR system by a memory for words and phrases. ...
arXiv:2107.02268v1
fatcat:2afway63wjdtdevkbr3rxktc5m
Multimodal machine translation through visuals and speech
2020
Machine Translation
This survey reviews the major data resources for these tasks, the evaluation campaigns concentrated around them, the state of the art in end-to-end and pipeline approaches, and also the challenges in performance ...
These tasks are distinguished from their monolingual counterparts of speech recognition, image captioning, and video captioning by the requirement of models to generate outputs in a different language. ...
We would also like to thank Maarit Koponen for her valuable feedback and her help in establishing our discussions of machine translation evaluation. ...
doi:10.1007/s10590-020-09250-0
fatcat:jod3ghcsnnbipotcqp6sme4lna
System combination and score normalization for spoken term detection
2013
2013 IEEE International Conference on Acoustics, Speech and Signal Processing
Spoken content in languages of emerging importance needs to be searchable to provide access to the underlying information. ...
First, we show score normalization methodology that improves in average by 20% keyword search performance. ...
In other words, ATWV metric emphasizes recall of rare terms. ...
doi:10.1109/icassp.2013.6639278
dblp:conf/icassp/MamouCCGKKMNPRSSW13
fatcat:3xtqd6xr75dktkbneeoait44de
Adaptive Feature Selection for End-to-End Speech Translation
[article]
2020
arXiv
pre-print
Information in speech signals is not evenly distributed, making it an additional challenge for end-to-end (E2E) speech translation (ST) to learn to focus on informative features. ...
In this paper, we propose adaptive feature selection (AFS) for encoder-decoder based E2E ST. ...
Acknowledgments We would like to thank Shucong Zhang for his great support on building our ASR baselines. IT acknowledges support of the European Research Council (ERC Starting grant 678254) and the ...
arXiv:2010.08518v2
fatcat:27fziwfdsnfffjvt2yasmg7p6e
Recent Advances in End-to-End Automatic Speech Recognition
[article]
2022
arXiv
pre-print
Recently, the speech community is seeing a significant trend of moving from deep neural network based hybrid modeling to end-to-end (E2E) modeling for automatic speech recognition (ASR). ...
While E2E models achieve the state-of-the-art results in most benchmarks in terms of ASR accuracy, hybrid models are still used in a large proportion of commercial ASR systems at the current time. ...
In such a case, E2E models have not learned to map the rare words' acoustic signal to words. ...
arXiv:2111.01690v2
fatcat:6pktwep34jdvjklw4gkri4yn4y
Hierarchical Multi-Stage Word-to-Grapheme Named Entity Corrector for Automatic Speech Recognition
2020
Interspeech 2020
In this paper, we propose a hierarchical multi-stage word-tographeme Named Entity Correction (NEC) algorithm. ...
We evaluate our solution on two different test sets from the call and music domains, for both server as well as on-device speech recognition configurations. ...
However, the misrecognition of rarely occurring words such as named entities (NEs) is a wellknown shortcoming of end-to-end models [13] . ...
doi:10.21437/interspeech.2020-3174
dblp:conf/interspeech/GargGGSK20
fatcat:njufs2tp4fhjvmyd5li7fbfstu
Speech Retrieval
[chapter]
2011
Spoken Language Understanding
The primary technical challenges of speech retrieval lie in the retrieval system's ability to deal with imperfect speech recognition technology that produces errorful output due to misrecognitions cause ...
by inadequate statistical models or out-of-vocabulary words. ...
The weights in the index transducer correspond to expected counts that are used for ranking.
Spoken Document Ranking in the Presence of Text Meta-Data Spoken documents rarely contain only speech. ...
doi:10.1002/9781119992691.ch15
fatcat:o36ulm7kh5dxvhm6alb4yz3qvy
Recent Progress in the CUHK Dysarthric Speech Recognition System
2021
IEEE/ACM Transactions on Audio Speech and Language Processing
Despite the rapid progress of automatic speech recognition (ASR) technologies in the past few decades, recognition of disordered speech remains a highly challenging task to date. ...
This paper presents recent research efforts at the Chinese University of Hong Kong (CUHK) to improve the performance of disordered speech recognition systems on the largest publicly available UASpeech ...
ACKNOWLEDGMENT We thank Disong Wang for sharing their cross- ...
doi:10.1109/taslp.2021.3091805
fatcat:7ss4ldio3rdprfjkhufor6fkvu
Visual features for context-aware speech recognition
2017
2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
We achieve good improvements in both cases and compare and analyze the respective reductions in word error rate. ...
In this paper, we extend our earlier work on adapting the acoustic model of a DNN-based speech recognition system to an RNN language model and show how both can be adapted to the objects and scenes that ...
In the long term, this work should help to improve fully end-to-end "video-to-text" approaches, which generate image or video "summaries" based on multi-modal embeddings, and reference "captions" [35, ...
doi:10.1109/icassp.2017.7953112
dblp:conf/icassp/GuptaMNM17
fatcat:kg3whgbgv5aevmx6rrdneatymu
« Previous
Showing results 1 — 15 out of 604 results