A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2004; you can also visit the original URL.
The file type is
Search engines use content and link information to crawl, index, retrieve, and rank Web pages. The correlations between similarity measures based on these cues and on semantic associations between pages therefore crucially affects the performance of any search tool. Here I begin to quantitatively analyze the relationship between content, link, and semantic similarity measures across a massive number of Web page pairs. Maps of semantic similarity across textual and link similarity highlight thedoi:10.1145/1013367.1013521 dblp:conf/www/Menczer04 fatcat:vralmmmzg5axjmm4t4k6d7plc4