Detecting Arabic Cloaking Web Pages Using Hybrid Techniques

Heider Wahsheh, Mohammed Al-Kabi, Izzat Alsmadi
2013 International Journal of Hybrid Information Technology  
Many challenges are emerging in the every day expanding Internet environment, whether for the Internet users or the Web sites owners. The Internet users need to retrieve the high quality relevant information which are relevant to their queries within a short period of time, in order to be a regular users who satisfied by search engine performance. While the Web site owners aim in most cases to increase the rank of their Web pages within SERP to attract more customers to their Web sites, and
more » ... equently gaining more visits, which in turn means more revenues. The top rank of the Web pages within SERPs, is very important to the e-commerce and commercial Web pages. The owners of Web sites can attract more visitors to their Web pages, and gain more revenue, through Pay Per Click when their pages appear in the top results of SERPs. This paper proposed new approach of Arabic Web spam detection, dedicated with the cloaking Web pages, using hybrid techniques of content and link analysis. The proposed detection system built the first Arabic cloaking dataset contains around 5,000 Arabic cloaked Web pages. The proposed system extracts all possible rules from HTML element to monitor the cloaking behaviors, and then used three classification algorithms (K-NN, Decision Tree, and Logistic Recognition) in the experimental tests. This novel system yielded a high accuracy results with an accuracy of 94.1606% in detecting cloaking behaviors in Arabic Web pages.
doi:10.14257/ijhit.2013.6.6.10 fatcat:hrojqrvnvzdkxe3yygigz6dei4