A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is application/pdf
.
Filters
Advances in Joint CTC-Attention based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM
[article]
2017
arXiv
pre-print
We learn to listen and write characters with a joint Connectionist Temporal Classification (CTC) and attention-based encoder-decoder network. ...
The encoder is a deep Convolutional Neural Network (CNN) based on the VGG network. The CTC network sits on top of the encoder and is jointly trained with the attention-based decoder. ...
Figure 1 shows the extended architecture, which includes joint decoding, a deep CNN encoder and an RNN-LM network. ...
arXiv:1706.02737v1
fatcat:cdswlmukebhnvepo6phfezmnae
Advances in Joint CTC-Attention Based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM
2017
Interspeech 2017
unpublished
We learn to listen and write characters with a joint Connectionist Temporal Classification (CTC) and attention-based encoder-decoder network. ...
The encoder is a deep Convolutional Neural Network (CNN) based on the VGG network. The CTC network sits on top of the encoder and is jointly trained with the attention-based decoder. ...
Figure 1 shows the extended architecture, which includes joint decoding, a deep CNN encoder and an RNN-LM network. ...
doi:10.21437/interspeech.2017-1296
fatcat:cidkl3ehzzfqnpahb5mamecr7u
Multi-encoder multi-resolution framework for end-to-end speech recognition
[article]
2018
arXiv
pre-print
Attention-based methods and Connectionist Temporal Classification (CTC) network have been promising research directions for end-to-end Automatic Speech Recognition (ASR). ...
In this work, we present a novel Multi-Encoder Multi-Resolution (MEMR) framework based on the joint CTC/Attention model. ...
Two parallel encoders with heterogeneous structures, RNN-based and CNN-RNN-based, are mutually complementary in characterizing the speech signal. ...
arXiv:1811.04897v1
fatcat:5kpfcdtarbfcrch6dbzoq4lfyu
Multi-Stream End-to-End Speech Recognition
[article]
2019
arXiv
pre-print
Attention-based methods and Connectionist Temporal Classification (CTC) network have been promising research directions for end-to-end (E2E) Automatic Speech Recognition (ASR). ...
In this work, we present a multi-stream framework based on joint CTC/Attention E2E ASR with parallel streams represented by separate encoders aiming to capture diverse information. ...
JOINT CTC/ATTENTION MECHANISM In this section, we review the joint CTC/attention architecture, which takes advantage of both CTC and attention-based end-to-end ASR approaches during training and decoding ...
arXiv:1906.08041v2
fatcat:xcsggemftfctzc4uioohemtpua
End-to-end Speech Recognition with Word-based RNN Language Models
[article]
2018
arXiv
pre-print
In our prior work, we have proposed a multi-level LM, in which character-based and word-based RNN-LMs are combined in hybrid CTC/attention-based ASR. ...
This paper investigates the impact of word-based RNN language models (RNN-LMs) on the performance of end-to-end automatic speech recognition (ASR). ...
CONCLUSION In this paper, we proposed a word-based RNN language model (RNN-LM) including a look-ahead mechanism for end-to-end automatic speech recognition (ASR). ...
arXiv:1808.02608v1
fatcat:q7jqea25dver3aidalglagh3cy
A new joint CTC-attention-based speech recognition model with multi-level multi-head attention
2019
EURASIP Journal on Audio, Speech, and Music Processing
A method called joint connectionist temporal classification (CTC)-attention-based speech recognition has recently received increasing focus and has achieved impressive performance. ...
A hybrid end-to-end architecture that adds an extra CTC loss to the attention-based model could force extra restrictions on alignments. ...
It was prepared as a speech recognition corpus by Vassil Panayotov.
Competing interests The authors declare that they have no competing interests. ...
doi:10.1186/s13636-019-0161-0
fatcat:22f5rozskbbpvmzw5o7lgy5tha
Recent Progresses in Deep Learning based Acoustic Models (Updated)
[article]
2018
arXiv
pre-print
, and the attention-based sequence-to-sequence model. ...
In this paper, we summarize recent progresses made in deep learning based acoustic models and the motivation and insights behind the surveyed techniques. ...
In [94] , it was further advanced with jointly decoding with the scores from both attention-based model and CTC model.
IV. ...
arXiv:1804.09298v2
fatcat:yfxzxu6qanbndcnmt3loikqeym
Multilingual sequence-to-sequence speech recognition: architecture, transfer learning, and language modeling
[article]
2018
arXiv
pre-print
Incorporating an RNNLM also brings significant improvements in terms of %WER, and achieves recognition performance comparable to the models trained with twice more training data. ...
Sequence-to-sequence (seq2seq) approach for low-resource ASR is a relatively new direction in speech research. The approach benefits by performing model training without using lexicon and alignments. ...
Previous studies applying CNN in seq2seq speech recognition [22] also showed that incorporating a deep CNNs in the encoder could further boost the performance. ...
arXiv:1810.03459v1
fatcat:alhb44umqbe47elpfmnzvukj7y
An Overview of End-to-End Automatic Speech Recognition
2019
Symmetry
But recently, HMM-deep neural network (DNN) model and the end-to-end model using deep learning has achieved performance beyond HMM-GMM. Both using deep learning techniques, ...
Automatic speech recognition, especially large vocabulary continuous speech recognition, is an important issue in the field of machine learning. ...
[31] introduced CNN and combined it with RNN for ASR. It designed the structure consists of four CNN layers, two dense layers, two RNN layer and a CTC layer. ...
doi:10.3390/sym11081018
fatcat:ea3ohiy765clzbj7yulonvz7eu
Identifying the influence of transfer learning method in developing an end-to-end automatic speech recognition system with a low data level
2022
Eastern-European Journal of Enterprise Technologies
Many research papers have shown that deep learning methods make it easier to train automatic speech recognition systems that use an end-to-end approach. ...
To increase efficiency and quickly solve the problem associated with a limited resource, transfer learning was used for the end-to-end model. ...
Acknowledgments This research has been funded by the Science Committee of the Ministry of Education and Science of the Republic of Kazakhstan (Grant No. AP09259309). ...
doi:10.15587/1729-4061.2022.252801
fatcat:drbererhabdzjilmcj5xpqwxrm
Recent Advances in End-to-End Automatic Speech Recognition
[article]
2022
arXiv
pre-print
Recently, the speech community is seeing a significant trend of moving from deep neural network based hybrid modeling to end-to-end (E2E) modeling for automatic speech recognition (ASR). ...
In this paper, we will overview the recent advances in E2E models, focusing on technologies addressing those challenges from the industry's perspective. ...
with key technologies year model encoder test-clean/other WER Deep Speech 2: more labeled data, curriculum learning [325] 2016 CTC bi-RNN 5.3/13.2 policy learning, joint training [326] 2018 CTC CNN+bi-GRU ...
arXiv:2111.01690v2
fatcat:6pktwep34jdvjklw4gkri4yn4y
The ASRU 2019 Mandarin-English Code-Switching Speech Recognition Challenge: Open Datasets, Tracks, Methods and Results
[article]
2020
arXiv
pre-print
Three tracks were set for advancing the AM and LM part in traditional DNN-HMM ASR system, as well as exploring the E2E models' performance. ...
This paper describes the design and main outcomes of the ASRU 2019 Mandarin-English code-switching speech recognition challenge, which aims to improve the ASR performance in Mandarin-English code-switching ...
Encoder-Decoder based system LAS [7] and transformer [6] use global attention and multi-head self-attention to generate implicit alignment. ...
arXiv:2007.05916v1
fatcat:yjuo2ejtwbacph7tgcymjocmvu
Adversarial Training of End-to-end Speech Recognition Using a Criticizing Language Model
[article]
2018
arXiv
pre-print
In this paper we proposed a novel Adversarial Training (AT) approach for end-to-end speech recognition using a Criticizing Language Model (CLM). ...
Moreover, AT can be applied to any end-to-end ASR model using any deep-learning-based language modeling frameworks, and compatible with any existing end-to-end decoding method. ...
"+LM" refers to shallow fusion decoding jointly with RNN-LM [13] , "+AT" refers to the adversarial training proposed here, "+Both" indicates training with AT and joint decoding with RNN-LM, and BT is ...
arXiv:1811.00787v1
fatcat:c7jgqcmbtva6fdwcjoo3cwdlqq
Stream attention-based multi-array end-to-end speech recognition
[article]
2019
arXiv
pre-print
Motivated by the advances of joint Connectionist Temporal Classification (CTC)/attention mechanism in the End-to-End (E2E) ASR, a stream attention-based multi-array framework is proposed in this work. ...
In terms of attention, a hierarchical structure is adopted. On top of the regular attention networks, stream attention is introduced to steer the decoder toward the most informative encoders. ...
Joint CTC/Attention Architecture for End-to-End ASR
Table 1 . 1 Description of the array configuration in the two-stream E2E experiments. ...
arXiv:1811.04903v2
fatcat:q6fcqmmytrfopo35iqgqan5coa
Automatic Speech Recognition Method Based on Deep Learning Approaches for Uzbek Language
2022
Sensors
In this paper, we propose an End-To-End Deep Neural Network-Hidden Markov Model speech recognition model and a hybrid Connectionist Temporal Classification (CTC)-attention network for the Uzbek language ...
The proposed approach reduces training time and improves speech recognition accuracy by effectively using CTC objective function in attention model training. ...
The CTC weight for the joint training with the attention model was set to 0.3. During the test phase, the CTC weight for the joint decoding was set to 0.6. ...
doi:10.3390/s22103683
fatcat:5iabrkpp2vhkjh6rcxrykfwei4
« Previous
Showing results 1 — 15 out of 79 results