Clustering Spam Campaigns with Fuzzy Hashing

Jianxing Chen, Romain Fontugne, Akira Kato, Kensuke Fukuda
2014 Proceedings of the AINTEC 2014 on Asian Internet Engineering Conference - AINTEC '14  
Identifying spamming botnets is essential to defeat spammers and reduce the harm caused by spam emails. The first step to uncover these botnets is the identification of spam campaigns. Simple methods looking for common identifiers in emails, such as URL or email addresses, are inefficient due to the emergence of obfuscation techniques like URL shortening. In this paper we propose a new method based on fuzzy hashing to cluster spam with common goals into the same spam campaign. Fuzzy hashing
more » ... ws us to identify emails with similar contents even though usual identifiers are obfuscated. Using the proposed method we process a three year long dataset that consists of 540 thousand spam emails. The efficiency of the proposed method is assessed by inspecting the characteristics of the top 100 campaigns found. Finally, we present typical behaviors of the uncovered spam campaigns and the corresponding botnets.
doi:10.1145/2684793.2684803 dblp:conf/aintec/ChenFKF14 fatcat:xoxtlslgobfmle24tsrqquky3a