Multi-level Deep Correlative Networks for Multi-modal Sentiment Analysis
Chinese journal of electronics
Multi-modal sentiment analysis (MSA) is increasingly becoming a hotspot because it extends the conventional Sentiment analysis (SA) based on texts to multi-modal content which can provide richer affective information. However, compared with textbased sentiment analysis, multi-modal sentiment analysis has much more challenges, because the joint learning process on multi-modal data requires both fine-grained semantic matching and effective heterogeneous feature fusion. Existing approaches
... y infer sentiment type from splicing features extracted from different modalities but neglect the strong semantic correlation among cooccurrence data of different modalities. To solve the challenges, a multi-level deep correlative network for multimodal sentiment analysis is proposed, which can reduce the semantic gap by analyzing simultaneously the middlelevel semantic features of images and the hierarchical deep correlations. First, the most relevant cross-modal feature representation is generated with Multi-modal Deep and discriminative correlation analysis (Multi-DDCA) while keeping those respective modal feature representations to be discriminative. Second, the high-level semantic outputs from multi-modal deep and discriminative correlation analysis are encoded into attention-correlation cross-modal feature representation through a co-attention-based multimodal correlation submodel, and then they are further merged by multi-layer neural network to train a sentiment classifier for predicting sentimental categories. Extensive experimental results on five datasets demonstrate the effectiveness of the designed approach, which outperforms several state-of-the-art fusion strategies for sentiment analysis.