Improving web spam classifiers using link structure

Qingqing Gan, Torsten Suel
2007 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web - AIRWeb '07  
Web spam has been recognized as one of the top challenges in the search engine industry [14] . A lot of recent work has addressed the problem of detecting or demoting web spam, including both content spam [16, 12] and link spam [22, 13] . However, any time an anti-spam technique is developed, spammers will design new spamming techniques to confuse search engine ranking methods and spam detection mechanisms. Machine learning-based classification methods can quickly adapt to newly developed spam
more » ... echniques. We describe a two-stage approach to improve the performance of common classifiers. We first implement a classifier to catch a large portion of spam in our data. Then we design several heuristics to decide if a node should be relabeled based on the preclassified result and knowledge about the neighborhood. Our experimental results show visible improvements with respect to precision and recall.
doi:10.1145/1244408.1244412 fatcat:w2pvvx6n45bjddnlps22bhw5ze