A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Semi-Supervision in ASR: Sequential MixMatch and Factorized TTS-Based Augmentation
2021
Conference of the International Speech Communication Association
Semi and self-supervised training techniques have the potential to improve performance of speech recognition systems without additional transcribed speech data. In this work, we demonstrate the efficacy of two approaches to semi-supervision for automated speech recognition. The two approaches leverage vast amounts of available unspoken text and untranscribed audio. First, we present factorized multilingual speech synthesis to improve data augmentation on unspoken text. Next, we propose the
doi:10.21437/interspeech.2021-677
dblp:conf/interspeech/ChenRZZGHEWRM21
fatcat:zsycvz73lvbffkha3psy3mu75q