Improving Speech Emotion Recognition Using Graph Attentive Bi-Directional Gated Recurrent Unit Network

Bo-Hao Su, Chun-Min Chang, Yun-Shao Lin, Chi-Chun Lee
<span title="2020-10-25">2020</span> <i title="ISCA"> <a target="_blank" rel="noopener" href="" style="color: black;">Interspeech 2020</a> </i> &nbsp;
The manner that human encodes emotion information within an utterance is often complex and could result in a diverse salient acoustic profile that is conditioned on emotion types. In this work, we propose a framework in imposing a graph attention mechanism on gated recurrent unit network (GA-GRU) to improve utterance-based speech emotion recognition (SER). Our proposed GA-GRU combines both long-range time-series based modeling of speech and further integrates complex saliency using a graph
more &raquo; ... ture. We evaluate our proposed GA-GRU on the IEMOCAP and the MSP-IMPROV database and achieve a 63.8% UAR and 57.47% UAR in a four class emotion recognition task. The GA-GRU obtains consistently better performances as compared to recent state-of-art in per-utterance emotion classification model, and we further observe that different emotion categories would require distinct flexible structures in modeling emotion information in the acoustic data that is beyond conventional left-to-right or vice versa. Index Terms : speech emotion recognition, graph, attention mechanism, recurrent neural network In the following sections, we will briefly introduce the two datasets and the acoustic features used in this work.
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="">doi:10.21437/interspeech.2020-1733</a> <a target="_blank" rel="external noopener" href="">dblp:conf/interspeech/SuCLL20</a> <a target="_blank" rel="external noopener" href="">fatcat:5soj4gpfr5cnpeglq63itm23nu</a> </span>
