A soft frequent pattern mining approach for textual topic detection

Georgios Petkos, Symeon Papadopoulos, Luca Aiello, Ryan Skraba, Yiannis Kompatsiaris
2014 Proceedings of the 4th International Conference on Web Intelligence, Mining and Semantics (WIMS14) - WIMS '14  
Unsolicited or spam email has recently become a major threat that can negatively impact the usability of electronic mail. Spam substantially wastes time and money for business users and network administrators, consumes network bandwidth and storage space and slows down email servers. In addition it provides a medium for distributing harmful code and/or offensive content and there is not any complete solution for this problem. In this paper we present a novel solution toward spam filtering by
more » ... pam filtering by using a new set of features for classification models. These features are the sequential unique and closed patterns which are extracted from the content of messages. After applying a term selection method, we show that these features have good performance in classifying spam messages from legitimate messages. The achieved results on 6 different datasets show the effectiveness of our proposed method compared to close similar methods. We outperform the accuracy near +2% compared to related state of arts. In addition our method is resilient against injecting irrelevant and bothersome words.
doi:10.1145/2611040.2611068 dblp:conf/wims/PetkosPASK14 fatcat:zdndpybdebfaphfuzngotjd2fy