Time-Scale Feature Extractions for Emotional Speech Characterization

Mohamed Chetouani, Ammar Mahdhaoui, Fabien Ringeval
2009 Cognitive Computation  
Emotional speech characterization is an important issue for the understanding of interaction. This article discusses the time-scale analysis problem in feature extraction for emotional speech processing. We describe a computational framework for combining segmental and supra-segmental features for emotional speech detection. The statistical fusion is based on the estimation of local a posteriori class probabilities and the overall decision employs weighting factors directly related to the
more » ... on of the individual speech segments. This strategy is applied to a real-world application: detection of Italian motherese in authentic and longitudinal parent-infant interaction at home. The results suggest that short-and long-term information, respectively, represented by the short-term spectrum and the prosody parameters (fundamental frequency and energy) provide a robust and efficient time-scale analysis. A similar fusion methodology is also investigated by the use of a phonetic-specific characterization process. This strategy is motivated by the fact that there are variations across emotional states at the phoneme level. A timescale based on both vowels and consonants is proposed and it provides a relevant and discriminant feature space for acted emotion recognition. The experimental results on two different databases Berlin (German) and Aholab (Basque) show that the best performance are obtained by our phoneme-dependent approach. These findings demonstrate the relevance of taking into account phoneme dependency (vowels/consonants) for emotional speech characterization.
doi:10.1007/s12559-009-9016-9 fatcat:gxmttfmq7rb4bdehpg5la5suni