Features Reweighting and Similarity Coefficient Based Method for Email Spam Filtering

Ahmed Osman Ali Elsiddig, Ammar Ahmed E. Elhadi, Ali Ahmed
2017 American Journal of Applied Sciences  
Spam is flooding the Internet with many copies of the same message, in an attempt to force the message on people who would not otherwise choose to receive it. Anti spam by determining whether or not an incoming email is spam has become an important problem. One of the main characters or the problem of Spam filtering is its high dimension of space feature. For this reason, we need a reducing stage of dimensions. This study tried to cover this side from spam detection techniques by study the
more » ... s by study the effect of re-weight of features. The works started by applying similarity coefficient in the dataset and then reweight the features in the dataset and applying similarity coefficient in the new data set. Finally make a Comparison between the result before and after re-weight and Comparison with feature selection method. The objective of this Thesis is: Study the similarity coefficient (Cosine and Dice) and Study the effects of the important feature to other features through the re-weight process. The most important results of this study are: Reweighting process did not improve the success rate of any of the two methods (Cosine and Dice). Also, Feature selection method led to improve detection in Cosine, while reweighting method not improve detection any of (Cosine or Dice).
doi:10.3844/ajassp.2017.983.993 fatcat:6dutqamxafge3ow23j5c4oylse