The Impact of Intent Distribution Mismatch on Semi-Supervised Spoken Language Understanding

Judith Gaspers, Quynh Do, Daniil Sorokin, Patrick Lehnen
2021 Conference of the International Speech Communication Association  
With the expanding role of voice-controlled devices, bootstrapping spoken language understanding models from little labeled data becomes essential. Semi-supervised learning is a common technique to improve model performance when labeled data is scarce. In a real-world production system, the labeled data and the online test data often may come from different distributions. In this work, we use semi-supervised learning based on pseudolabeling with an auxiliary task on incoming unlabeled noisy
more » ... , which is closer to the test distribution. We demonstrate empirically that our approach can mitigate negative effects arising from training with non-representative labeled data as well as the negative impacts of noises in the data, which are introduced by pseudo-labeling and automatic speech recognition.
doi:10.21437/interspeech.2021-335 dblp:conf/interspeech/GaspersDSL21 fatcat:a2zil2mnjrfmdnhcmhxck3kvje