A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2016; you can also visit the original URL.
The file type is application/pdf
.
Topic-oriented collaborative crawling
2002
Proceedings of the eleventh international conference on Information and knowledge management - CIKM '02
A major concern in the implementation of a distributed Web crawler is the choice of a strategy for partitioning the Web among the nodes in the system. Our goal in selecting this strategy is to minimize the overlap between the activities of individual nodes. We propose a topic-oriented approach, in which the Web is partitioned into general subject areas with a crawler assigned to each. We examine design alternatives for a topic-oriented distributed crawler, including the creation of a Web page
doi:10.1145/584800.584802
fatcat:rgcoz7oxuzdivj6tsivbzvnkeu