Untranscribed Web Audio for Low Resource Speech Recognition

Andrea Carmantini, Peter Bell, Steve Renals
2019 Interspeech 2019  
Speech recognition models are highly susceptible to mismatch in the acoustic and language domains between the training and the evaluation data. For low resource languages, it is difficult to obtain transcribed speech for target domains, while untranscribed data can be collected with minimal effort. Recently, a method applying lattice-free maximum mutual information (LF-MMI) to untranscribed data has been found to be effective for semi-supervised training. However, weaker initial models and
more » ... n mismatch can result in high deletion rates for the semi-supervised model. Therefore, we propose a method to force the base model to overgenerate possible transcriptions, relying on the ability of LF-MMI to deal with uncertainty. On data from the IARPA MATERIAL programme, our new semi-supervised method outperforms the standard semisupervised method, yielding significant gains when adapting for mismatched bandwidth and domain.
doi:10.21437/interspeech.2019-2623 dblp:conf/interspeech/Carmantini0R19 fatcat:4rqhxle4jrawnotksgjulatyay