Classification of Malicious Web Pages through a J48 Decision Tree, a Naïve Bayes, a RBF Network and a Random Forest Classifier for WebSpam Detection

Muhammad Iqbal, Malik Muneeb Abid, Usman Waheed, Syed Hasnain Alam Kazmi
2017 International Journal of u- and e- Service, Science and Technology  
Web spam is a negative practice carried out by spammers to produce fake search engines results for improving rank position of their Web pages. It is available on arena of World Wide Web (WWW) in different forms and lacks a consistent definition. The search engines are struggling to eliminate spam pages through machine learning (ML) detectors. Mostly, search engines measure the quality of websites by using different factors (signals) such as, number of visitors, body text, anchor text, back link
more » ... hor text, back link and forward link etc. information and, and spammers try to induce these signals into their desired pages to subvert ranking function of search engines. This study compares the detection efficiency of different ML classifiers trained and tested on WebSpam UK2007 data set. The results of our study show that random forest has achieve higher score than other well-known classifiers.
doi:10.14257/ijunesst.2017.10.4.05 fatcat:ptedb25dhvbqllqq7ukneourzq