A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is
Continuous dimensional emotion prediction is a challenging task where the fusion of various modalities usually achieves state-of-the-art performance such as early fusion or late fusion. In this paper, we propose a novel multi-modal fusion strategy named conditional attention fusion, which can dynamically pay attention to different modalities at each time step. Long-short term memory recurrent neural networks (LSTM-RNN) is applied as the basic uni-modality model to capture long timearXiv:1709.02251v1 fatcat:gyk5bx76xnahnfz6gq2c6pw2ka