A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit <a rel="external noopener" href="https://www.isca-speech.org/archive/Interspeech_2020/pdfs/1733.pdf">the original URL</a>. The file type is <code>application/pdf</code>.
Improving Speech Emotion Recognition Using Graph Attentive Bi-Directional Gated Recurrent Unit Network
<span title="2020-10-25">2020</span>
<i title="ISCA">
<a target="_blank" rel="noopener" href="https://fatcat.wiki/container/trpytsxgozamtbp7emuvz2ypra" style="color: black;">Interspeech 2020</a>
</i>
The manner that human encodes emotion information within an utterance is often complex and could result in a diverse salient acoustic profile that is conditioned on emotion types. In this work, we propose a framework in imposing a graph attention mechanism on gated recurrent unit network (GA-GRU) to improve utterance-based speech emotion recognition (SER). Our proposed GA-GRU combines both long-range time-series based modeling of speech and further integrates complex saliency using a graph
<span class="external-identifiers">
<a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.21437/interspeech.2020-1733">doi:10.21437/interspeech.2020-1733</a>
<a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/interspeech/SuCLL20.html">dblp:conf/interspeech/SuCLL20</a>
<a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/5soj4gpfr5cnpeglq63itm23nu">fatcat:5soj4gpfr5cnpeglq63itm23nu</a>
</span>
more »
... ture. We evaluate our proposed GA-GRU on the IEMOCAP and the MSP-IMPROV database and achieve a 63.8% UAR and 57.47% UAR in a four class emotion recognition task. The GA-GRU obtains consistently better performances as compared to recent state-of-art in per-utterance emotion classification model, and we further observe that different emotion categories would require distinct flexible structures in modeling emotion information in the acoustic data that is beyond conventional left-to-right or vice versa. Index Terms : speech emotion recognition, graph, attention mechanism, recurrent neural network In the following sections, we will briefly introduce the two datasets and the acoustic features used in this work.
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20201213170432/https://www.isca-speech.org/archive/Interspeech_2020/pdfs/1733.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext">
<button class="ui simple right pointing dropdown compact black labeled icon button serp-button">
<i class="icon ia-icon"></i>
Web Archive
[PDF]
<div class="menu fulltext-thumbnail">
<img src="https://blobs.fatcat.wiki/thumbnail/pdf/42/cc/42cc1ad4bdfdeb90eca3920824ebc4e4dafb2671.180px.jpg" alt="fulltext thumbnail" loading="lazy">
</div>
</button>
</a>
<a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.21437/interspeech.2020-1733">
<button class="ui left aligned compact blue labeled icon button serp-button">
<i class="external alternate icon"></i>
Publisher / doi.org
</button>
</a>