Semi Supervised Image Spam Hunter: A Regularized Discriminant EM Approach [chapter]

Yan Gao, Ming Yang, Alok Choudhary
2009 Lecture Notes in Computer Science  
Image spam is a new trend in the family of email spams. The new image spams employ a variety of image processing technologies to create random noises. In this paper, we propose a semi-supervised approach, regularized discriminant EM algorithm (RDEM), to detect image spam emails, which leverages small amount of labeled data and large amount of unlabeled data for identifying spams and training a classification model simultaneously. Compared with fully supervised learning algorithms, the
more » ... vised learning algorithm is more suitedin adversary classification problems, because the spammers are actively protecting their work by constantly making changes to circumvent the spam detection. It makes the cost too high for fully supervised learning to frequently collect sufficient labeled data for training. Experimental results demonstrate that our approach achieves 91.66% high detection rate with less than 2.96% false positive rate, meanwhile it significantly reduces the labeling cost.
doi:10.1007/978-3-642-03348-3_17 fatcat:x2xizupfa5gbhllzwnhev3b3je