Combining Non-Pathological Data of Different Language Varieties to Improve DNN-HMM Performance on Pathological Speech

Emre Yılmaz, Mario Ganzeboom, Catia Cucchiarini, Helmer Strik
2016 Interspeech 2016  
Research on automatic speech recognition (ASR) of pathological speech is particularly hindered by scarce in-domain data resources. Collecting representative pathological speech data is difficult due to the large variability caused by the nature and severity of the disorders, and the rigorous ethical and medical permission requirements. This task becomes even more challenging for languages which have fewer resources, fewer speakers and fewer patients than English, such as the mid-sized language
more » ... utch. In this paper, we investigate the impact of combining speech data from different varieties of the Dutch language for training deep neural network (DNN)-based acoustic models. Flemish is chosen as the target variety for testing the acoustic models, since a Flemish database of pathological speech, the COPAS database, is available. We use nonpathological speech data from the northern Dutch and Flemish varieties and perform speaker-independent recognition using the DNN-HMM system trained on the combined data. The results show that this system provides improved recognition of pathological Flemish speech compared to a baseline system trained only on Flemish data. These findings open up new opportunities for developing useful ASR-based pathological speech applications for languages that are smaller in size and less resourced than English.
doi:10.21437/interspeech.2016-109 dblp:conf/interspeech/YilmazGCS16 fatcat:ygz5khmp7vfjdbvqnjjzj56uve