Using neighborhood information for automated categorization of Web pages

Nadejda Panteleeva
2003 International United Information Systems Conference  
In this paper we discuss several issues related to the influence of expansion of a Web document representation on quality of topical categorization of Web pages. We consider a Web page expansion by using text content of it's linking pages. We show that naive expansion can grab too much noise and essentially harm categorization results. We present the approach to automated pruning of linking Web pages. We report that using our approach in forming a Web page representation always leads to better
more » ... esults than traditional single Web page categorization.
dblp:conf/ista/Panteleeva03 fatcat:gdggxy3dszeghccfvgjoy34wvy