Multi-modal Graph Contrastive Learning for Micro-video Recommendation

Zixuan Yi, Xi Wang, Iadh Ounis, Craig Macdonald
2022 Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval  
Recently micro-videos have become more popular in social media platforms such as TikTok and Instagram. Engagements in these platforms are facilitated by multi-modal recommendation systems. Indeed, such multimedia content can involve diverse modalities, often represented as visual, acoustic, and textual features to the recommender model. Existing works in micro-video recommendation tend to unify the multi-modal channels, thereby treating each modality with equal importance. However, we argue
more » ... these approaches are not sufficient to encode item representations with multiple modalities, since the used methods cannot fully disentangle the users' tastes on different modalities. To tackle this problem, we propose a novel learning method named Multi-Modal Graph Contrastive Learning (MMGCL), which aims to explicitly enhance multi-modal representation learning in a self-supervised learning manner. In particular, we devise two augmentation techniques to generate the multiple views of a user/item: modality edge dropout and modality masking. Furthermore, we introduce a novel negative sampling technique that allows to learn the correlation between modalities and ensures the effective contribution of each modality. Extensive experiments conducted on two micro-video datasets demonstrate the superiority of our proposed MMGCL method over existing state-of-the-art approaches in terms of both recommendation performance and training convergence speed. CCS CONCEPTS • Information systems → Recommender systems.
doi:10.1145/3477495.3532027 fatcat:44ybm5ouzndfldmicjvvt4tgyu