A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition
2020
Interspeech 2020
Recently, there has been a strong push to transition from hybrid models to end-to-end (E2E) models for automatic speech recognition. Currently, there are three promising E2E methods: recurrent neural network transducer (RNN-T), RNN attentionbased encoder-decoder (AED), and Transformer-AED. In this study, we conduct an empirical comparison of RNN-T, RNN-AED, and Transformer-AED models, in both non-streaming and streaming modes. We use 65 thousand hours of Microsoft anonymized training data to
doi:10.21437/interspeech.2020-2846
dblp:conf/interspeech/Li0G0Z020
fatcat:2xfo2lo4q5cgbgecg3lufby7oq