Prediction of Sleepiness Ratings from Voice by Man and Machine

Mark Huckvale, András Beke, Mirei Ikushima
2020 Interspeech 2020  
This paper looks in more detail at the Interspeech 2019 computational paralinguistics challenge on the prediction of sleepiness ratings from speech. In this challenge, teams were asked to train a regression model to predict sleepiness from samples of the Düsseldorf Sleepy Language Corpus (DSLC). This challenge was notable because the performance of all entrants was uniformly poor, with even the winning system only achieving a correlation of r=0.37. We look at whether the task itself is
more » ... itself is achievable, and whether the corpus is suited to training a machine learning system for the task. We perform a listening experiment using samples from the corpus and show that a group of human listeners can achieve a correlation of r=0.7 on this task, although this is mainly by classifying the recordings into one of three sleepiness groups. We show that the corpus, because of its construction, confounds variation with sleepiness and variation with speaker identity, and this was the reason that machine learning systems failed to perform well. We conclude that sleepiness rating prediction from voice is not an impossible task, but that good performance requires more information about sleepy speech and its variability across listeners than is available in the DSLC corpus.
doi:10.21437/interspeech.2020-1601 dblp:conf/interspeech/HuckvaleBI20 fatcat:wmusbd4yljbdbg2jj3bxvx5h6m