Aggressive and Offensive Language Identification in Hindi, Bangla, and English: A Comparative Study

Ritesh Kumar, Bornini Lahiri, Atul Kr. Ojha
2021 SN Computer Science  
In the present paper, we carry out a comparative study between offensive and aggressive language and attempt to understand their inter-relationship. To carry out this study, we develop classifiers for offensive and aggressive language identification in Hindi, Bangla, and English using the datasets released for the languages as part of the two shared tasks: hate speech and offensive content identification in Indo-European languages (HASOC) and aggression and misogyny identification task at
more » ... . The HASOC dataset is annotated with the information about offensive language and TRAC-2 dataset is annotated with the information about aggressive language. We experiment with SVM as well as BERT and its different derivatives such as ALBERT and DistilBERT for developing the classifiers. The best classifiers achieve an impressive F-score in between 0.70 and 0.80 for different tasks. We use these classifiers to cross-annotate the two datasets, and look at the co-occurrence of different sub-categories of aggression and offense. The study shows that even though aggression and offense significantly overlaps, but still one does not entail the other.
doi:10.1007/s42979-020-00414-6 fatcat:lc7ikczaubd2xagnrwpdw54nmu