Automatic Personalized Spam Filtering through Significant Word Modeling

Khurum Naz Junejo, Asim Karim
2007 19th IEEE International Conference on Tools with Artificial Intelligence(ICTAI 2007)  
Typically, spam filters are built on the assumption that the characteristics of e-mails in the training set is identical to those in individual users' inboxes on which it will be applied. This assumption is oftentimes incorrect leading to poor performance of the filter. A personalized spam filter is built by taking into account the characteristics of e-mails in individual users' inboxes. We present an automatic approach for personalized spam filtering that does not require users' feedback. The
more » ... roposed algorithm builds a statistical model of significant spam and non-spam words from the labeled training set and then updates it in multiple passes over the unlabeled individual user's inbox. The personalization of the model leads to improved filtering performance. We evaluate our algorithm on two publicly available datasets. The results show that our algorithm is robust and scalable, and a viable solution to the server-side personalized spam filtering problem. Moreover, it outperforms published results on one dataset and its performance is equivalent to the others on the second dataset.
doi:10.1109/ictai.2007.66 dblp:conf/ictai/JunejoK07 fatcat:ucyps6ixrnfk5jexny66sar75i