SF-HME system

Petros Belsis, Kostas Fragos, Stefanos Gritzalis, Christos Skourlas
2006 Proceedings of the 2006 ACM symposium on Applied computing - SAC '06  
Many linear statistical models have been lately proposed in text classification related literature and evaluated against the Unsolicited Bulk Email filtering problem. Despite their popularity -due both to their simplicity and relative ease of interpretationthe non-linearity assumption of data samples is inappropriate in practice, due to its inability to capture the apparent non-linear relationships, which characterize these samples. In this paper, we propose the SF-HME, a Hierarchical
more » ... -Experts system, attempting to overcome limitations common to other machinelearning based approaches when applied to spam mail classification. By reducing the dimensionality of data through the usage of the effective Simba algorithm for feature selection, we evaluated our SF-HME system with a publicly available corpus of emails, with very high similarity between legitimate and bulk email -and thus low discriminative potential -where the traditional rule based filtering approaches achieve considerable lower degrees of precision. As a result, we confirm the domination of our SF-HME method against other machine learning approaches, which appeared to present lesser degree of recall.
doi:10.1145/1141277.1141360 dblp:conf/sac/BelsisFGS06 fatcat:4dpk3ggbsfc4pita2qlhqsyuyi