HMTL: Heterogeneous Modality Transfer Learning for Audio-visual Sentiment Analysis

Sanghyun Seo, Sanghyuck Na, Juntae Kim
2020 IEEE Access  
Multimodal sentiment analysis is an extended approach to traditional language-based sentiment analysis, which uses other relevant modality data. Multimodal sentiment analysis usually applies visual, textual, and acoustic representations for sentiment prediction. Recently, various data fusion methodologies have been proposed for multimodal sentiment analysis. In most cases, textual modality plays a major role, and visual and acoustic modalities are used as auxiliary sources for multimodal
more » ... nt analysis. However, in general multimedia such as video, text transcripts of an individual's speech are not provided. Research on an audio-visual sentiment analysis methodology that does not depend on text modality is essential for multimodal sentiment analysis in real-world industrial applications. Therefore, it is important to improve audio-visual sentiment analysis because it currently exhibits lower performance than multimodal sentiment analysis, including text modality. In this paper, we propose heterogeneous modality transfer learning (HMTL) to utilize the knowledge of aligned text data as a source modality in transfer learning to improve audio-visual sentiment analysis performance. Our approach uses a decoder and adversarial learning techniques to reduce the gap between the source and target modalities in the embedded space for multimodal representation. Our proposed methodology experimentally outperformed recent unimodal and bimodal audio-visual sentiment analysis achievements. INDEX TERMS Multimodal sentiment analysis, heterogeneous transfer learning, data fusion.
doi:10.1109/access.2020.3006563 fatcat:qs2ld6ly3nhppmejwlbhy36guu