Classification of bisyllabic lexical stress patterns in disordered speech using deep learning
2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Technology-based therapy tools can be of great benefit to children with developmental speech disabilities as they typically require sustained practice with a speech therapist for several years. Towards this aim, over the past 4 years we have developed speech processing tools to automatically detect common errors in disordered speech. This paper presents an automated technique to identify incorrect lexical stress. Specifically, we describe a deep neural network (DNN) that can be used to classify
... be used to classify the four different bisyllabic stress patterns: strong-weak (SW), weak-strong (WS), strong-strong (SS) and weak-weak (WW). We derive input features for the DNN from the duration, pitch, intensity and spectral energy on each of the two consecutive syllables. Using these features, we achieve 93% correct classification between SW/WS stress patterns and 88% correct classification of the four bisyllabic patterns on speech from typically developing children, while we obtain 73.4% classification between SW/WS in disordered speech. These figures represent a two-fold reduction in error rates compared to our prior work, which used a DNN with differential features from consecutive syllables. Index Terms-deep neural network, prosody, lexical stress, automated speech therapy.