Improving DNN-Based Automatic Recognition of Non-native Children Speech with Adult Speech

Yao Qian, Xinhao Wang, Keelan Evanini, David Suendermann-Oeft
2016 Workshop on Child Computer Interaction  
Acoustic models for state-of-the-art DNN-based speech recognition systems are typically trained using at least several hundred hours of task-specific training data. However, this amount of training data is not always available for some applications. In this paper, we investigate how to use an adult speech corpus to improve DNN-based automatic speech recognition for non-native children's speech. Although there are many acoustic and linguistic mismatches between the speech of adults and children,
more » ... adult speech can still be used to boost the performance of a speech recognizer for children using acoustic modeling techniques based on the DNN framework. The experimental results show that the best recognition performance can be achieved by combining children's training data with adult training data of approximately the same size and initializing the DNN with the weights obtained by pre-training using the full training set of the adult corpus. This system can outperform the baseline system trained on only children's speech with an overall relative WER reduction of 11.9%. Among the three speaking tasks studied, the picture narration task shows the largest gain with a WER reduction from 24.6 % to 20.1%.
doi:10.21437/wocci.2016-7 dblp:conf/wocci/QianWES16 fatcat:rw46atb4ubhsxmb3cbdzqxld3i