A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation
[article]
2020
arXiv
pre-print
Data augmentation is one of the most effective ways to make end-to-end automatic speech recognition (ASR) perform close to the conventional hybrid approach, especially when dealing with low-resource tasks. Using recent advances in speech synthesis (text-to-speech, or TTS), we build our TTS system on an ASR training database and then extend the data with synthesized speech to train a recognition model. We argue that, when the training data amount is relatively low, this approach can allow an
arXiv:2005.07157v2
fatcat:qgp3erhkwjgh5kxg2laoqdxmr4