Emotional speech characterization based on multi-features fusion for face-to-face interaction

Ammar Mahdhaoui, Fabien Ringeval, Mohamed Chetouani
2009 2009 3rd International Conference on Signals, Circuits and Systems (SCS)  
Speech contains non verbal elements known as paralanguage, including voice quality, emotion and speaking style, as well as prosodic features such as rhythm, intonation and stress. The study of nonverbal communication has focused on face-to-face interaction since that the behaviors of communicators play a major role during social interaction and transport information between the different speakers. In this paper, we describe a computational framework for combining different features for
more » ... speech detection. The statistical fusion is based on the estimation of local a posteriori class probabilities and the overall decision employs weighting factors directly related to the duration of the individual speech segments. This strategy is applied to a real-life application: detection of Italian motherese in authentic and longitudinal parent-infant interaction at home. The results suggest that short-and longterm information provide a robust and efficient time-scale analysis. A similar fusion methodology is also investigated by the use of a phonetic-specific characterization process. This strategy is motivated by the fact that there are variations across emotional states at the phoneme level. A time-scale based on both vowels and consonants is proposed and it provides a relevant discriminant feature space for acted emotion recognition.
doi:10.1109/icscs.2009.5412691 fatcat:dxq3r2ealfavfhc2fp4j3o5fwm