Phoneme State Posteriorgram Features for Speech Based Automatic Classification of Speakers in Cold and Healthy Condition

Akshay Kalkunte Suresh, Srinivasa Raghavan K.M., Prasanta Kumar Ghosh
2017 Interspeech 2017   unpublished
We consider the problem of automatically detecting if a speaker is suffering from common cold from his/her speech. When a speaker has symptoms of cold, his/her voice quality changes compared to the normal one. We hypothesize that such a change in voice quality could be reflected in lower likelihoods from a model built using normal speech. In order to capture this, we compute a 120-dimensional posteriorgram feature in each frame using Gaussian mixture model from 120 states of 40 three-states
more » ... etic hidden Markov models trained on approximately 16.4 hours of normal English speech. Finally, a fixed 5160-dimensional phoneme state posteriorgram (PSP) feature vector for each utterance is obtained by computing statistics from the posteriorgram feature trajectory. Experiments on the 2017-Cold sub-challenge data show that when the decisions from bag-of-audio-words (BoAW) and end-to-end (e2e) are combined with those from PSP features with unweighted majority rule, the UAR on the development set becomes 69% which is 2.9% (absolute) better than the best of the UARs obtained by the baseline schemes. When the decisions from ComParE, BoAW and PSP features are combined with simple majority rule, it results in a UAR of 68.52% on the test set.
doi:10.21437/interspeech.2017-1550 fatcat:jeblhdb53zg4bk7p2fxljhfkue