A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2016; you can also visit the original URL.
The file type is
Topic relevance of pages and hyperlinks is the key issue in focused crawling. In this paper, an improved topic relevance algorithm for focused crawling is proposed. First, we implement a prototype system of the focused crawler -a topicspecific news gathering system which is prepared for comparative experiments on different similarity measures with the anchor text. Second, experiments on Chinese text corpus show that using LSI (Latent Semantic Indexing) outperforms using TF-IDF (termdoi:10.1109/icsmc.2011.6083759 dblp:conf/smc/HaoMYLW11 fatcat:qex6mjeitvcghltaujii3biycq