Alpha at SemEval-2021 Task 6: Transformer Based Propaganda Classification

Zhida Feng, Jiji Tang, Jiaxiang Liu, Weichong Yin, Shikun Feng, Yu Sun, Li Chen
2021 Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)   unpublished
This paper describes our system participated in Task 6 of SemEval-2021: this task focuses on multimodal propaganda technique classification and it aims to classify given image and text into 22 classes. In this paper, we propose to use transformer-based (Vaswani et al., 2017) architecture to fuse the clues from both image and text. We explore two branches of techniques including fine-tuning the text pre-trained transformer with extended visual features and fine-tuning the multimodal pre-trained
more » ... ransformers. For the visual features, we experiment with both grid features extracted from ResNet(He et al., 2016) network and salient region features from a pretrained object detector. Among the pre-trained multimodal transformers, we choose ERNIE-ViL (Yu et al., 2020), a two-steam crossattended transformers model pre-trained on large-scale image-caption aligned data. Finetuning ERNIE-ViL for our task produces a better performance due to general joint multimodal representation for text and image learned by ERNIE-ViL. Besides, as the distribution of the classification labels is extremely unbalanced, we also make a further attempt on the loss function and the experiment results show that focal loss would perform better than cross-entropy loss. Lastly, we ranked first place at sub-task C in the final competition.
doi:10.18653/v1/2021.semeval-1.8 fatcat:mqqliqrgrnfidktls3qfsmeqxm