Detecting offensive tweets via topical feature discovery over a large scale twitter corpus

Guang Xiang, Bin Fan, Ling Wang, Jason Hong, Carolyn Rose
2012 Proceedings of the 21st ACM international conference on Information and knowledge management - CIKM '12  
In this paper, we propose a novel semi-supervised approach for detecting profanity-related offensive content in Twitter. Our approach exploits linguistic regularities in profane language via statistical topic modeling on a huge Twitter corpus, and detects offensive tweets using these automatically generated features. Our approach performs competitively with a variety of machine learning (ML) algorithms. For instance, our approach achieves a true positive rate (TP) of 75.1% over 4029 testing
more » ... ts using Logistic Regression, significantly outperforming the popular keyword matching baseline, which has a TP of 69.7%, while keeping the false positive rate (FP) at the same level as the baseline at about 3.77%. Our approach provides an alternative to large scale hand annotation efforts required by fully supervised learning approaches.
doi:10.1145/2396761.2398556 dblp:conf/cikm/XiangFWHR12 fatcat:f347mar4tjaflmwctjbhpxj2vi