Quantum-inspired Multimodal Fusion for Video Sentiment Analysis [article]

Qiuchi Li, Dimitris Gkoumas, Christina Lioma, Massimo Melucci
<span title="2021-03-22">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
We tackle the crucial challenge of fusing different modalities of features for multimodal sentiment analysis. Mainly based on neural networks, existing approaches largely model multimodal interactions in an implicit and hard-to-understand manner. We address this limitation with inspirations from quantum theory, which contains principled methods for modeling complicated interactions and correlations. In our quantum-inspired framework, the word interaction within a single modality and the
interaction across modalities are formulated with superposition and entanglement respectively at different stages. The complex-valued neural network implementation of the framework achieves comparable results to state-of-the-art systems on two benchmarking video sentiment analysis datasets. In the meantime, we produce the unimodal and bimodal sentiment directly from the model to interpret the entangled decision.
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2103.10572v2">arXiv:2103.10572v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/hldyp5i35jhwrdtc7gvneb353m">fatcat:hldyp5i35jhwrdtc7gvneb353m</a> </span>
