A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Using neighborhood information for automated categorization of Web pages
2003
International United Information Systems Conference
In this paper we discuss several issues related to the influence of expansion of a Web document representation on quality of topical categorization of Web pages. We consider a Web page expansion by using text content of it's linking pages. We show that naive expansion can grab too much noise and essentially harm categorization results. We present the approach to automated pruning of linking Web pages. We report that using our approach in forming a Web page representation always leads to better
dblp:conf/ista/Panteleeva03
fatcat:gdggxy3dszeghccfvgjoy34wvy