Automatic detection of speaker state: Lexical, prosodic, and phonetic approaches to level-of-interest and intoxication classification

William Yang Wang, Fadi Biadsy, Andrew Rosenberg, Julia Hirschberg
2013 Computer Speech and Language  
Traditional studies of speaker state focus primarily upon one-stage classification techniques using standard acoustic features. In this article, we investigate multiple novel features and approaches to two recent tasks in speaker state detection: level-of-interest (LOI) detection and intoxication detection. In the task of LOI prediction, we propose a novel Discriminative TFIDF feature to capture important lexical information and a novel Prosodic Event detection approach using AuToBI; we combine
more » ... these with acoustic features for this task using a new multilevel multistream prediction feedback and similarity-based hierarchical fusion learning approach. Our experimental results outperform published results of all systems in the 2010 Interspeech Paralinguistic Challenge -Affect Subchallenge. In the intoxication detection task, we evaluate the performance of Prosodic Event-based, phone duration-based, phonotactic, and phonetic-spectral based approaches, finding that a combination of the phonotactic and phoneticspectral approaches achieve significant improvement over the 2011 Interspeech Speaker State Challenge -Intoxication Subchallenge baseline. We discuss our results using these new features and approaches and their implications for future research. Acoustic features 1582 acoustic features Detail see Schuller et al. (2010) Prosodic and VQ # Pulses, # periods, mean periods, SDev period Voicing fraction, # voice breaks, degree, Voiced2total frames Jitter local, local (absolute), RAP, PPQ5 Shimmer local, local (dB), APQ3, APQ5, APQ11 Harmonicity mean autocorrelation, Harmonicity mean NHR, mean NHR (dB) Duration seconds F0 min, max, mean, median, SDev, MAS Energy min, max, mean, SDev Prosodic Events Pitch accents, intermediate phrase, and intonational boundaries VQ: Voice Quality; SDev: standard deviation; RAP: relative average perturbation; PPQ5: five-point period perturbation quotient; APQn: n-point amplitude perturbation quotient; NHR: noise-to-harmonics ratio; MAS: mean absolute slope.
doi:10.1016/j.csl.2012.03.004 fatcat:xmkbyv6zv5bydh76e253i2gepy