Analysing acoustic model changes for active learning in automatic speech recognition

Chenhao Wu, Raymond W. M. Ng, Oscar Saz Torralba, Thomas Hain
2017 2017 International Conference on Systems, Signals and Image Processing (IWSSIP)  
In active learning for Automatic Speech Recognition (ASR), a portion of data is automatically selected for manual transcription. The objective is to improve ASR performance with retrained acoustic models. The standard approaches are based on confidence of individual sentences. In this study, we look into an alternative view on transcript label quality, in which Gaussian Supervector Distance (GSD) is used as a criterion for data selection. GSD is a metric which quantifies how the model was
more » ... d during its adaptation. By using an automatic speech recognition transcript derived from an out-of-domain acoustic model, unsupervised adaptation was conducted and GSD was computed. The adapted model is then applied to an audio book transcription task. It is found that GSD provide hints for predicting data transcription quality. A preliminary attempt in active learning proves the effectiveness of GSD selection criterion over random selection, shedding light on its prospective use.
doi:10.1109/iwssip.2017.7965609 dblp:conf/iwssip/WuNSH17 fatcat:sybspv53y5amxge7e7rvk7ll74