A speech emotion recognition framework based on latent Dirichlet allocation: Algorithm and FPGA implementation

Mohit Shah, Lifeng Miao, Chaitali Chakrabarti, Andreas Spanias
2013 2013 IEEE International Conference on Acoustics, Speech and Signal Processing  
In this paper, we present a speech-based emotion recognition framework based on a latent Dirichlet allocation model. This method assumes that incoming speech frames are conditionally independent and exchangeable. While this leads to a loss of temporal structure, it is able to capture significant statistical information between frames. In contrast, a hidden Markov model-based approach captures the temporal structure in speech. Using the German emotional speech database EMO-DB for evaluation, we
more » ... chieve an average classification accuracy of 80.7% compared to 73% for hidden Markov models. This improvement is achieved at the cost of a slight increase in computational complexity. We map the proposed algorithm onto an FPGA platform and show that emotions in a speech utterance of duration 1.5s can be identified in 1.8ms, while utilizing 70% of the resources. This further demonstrates the suitability of our approach for real-time applications on hand-held devices.
doi:10.1109/icassp.2013.6638116 dblp:conf/icassp/ShahMCS13 fatcat:efylch6gk5fhteiwqlbcrk5fcm