A Segment-based Weighting Technique for URL-based Genre Classification of Web Pages

Chaker Jebari
2016 POLIBITS Research Journal on Computer Science and Computer Engineering With Applications  
We propose a segment-based weighting technique for genre classification of web pages. This technique exploits character n-grams extracted from the URL of the web page rather than its textual content. The main idea of our technique is to segment the URL and assigns a weight for each segment. Experiments conducted on three known genre datasets show that our method achieves encouraging results.
doi:10.17562/pb-53-4 fatcat:zeys63pvmfeojbngossykb4tyu