Combining link and content analysis to estimate semantic similarity

Filippo Menczer
2004 Alternate track papers & posters of the 13th international conference on World Wide Web - WWW Alt. '04  
Search engines use content and link information to crawl, index, retrieve, and rank Web pages. The correlations between similarity measures based on these cues and on semantic associations between pages therefore crucially affects the performance of any search tool. Here I begin to quantitatively analyze the relationship between content, link, and semantic similarity measures across a massive number of Web page pairs. Maps of semantic similarity across textual and link similarity highlight the
more » ... otential and limitations of lexical and link analysis for relevance approximation, and provide us with a way to study whether and how text and link based measures should be combined.
doi:10.1145/1010432.1010586 fatcat:okxs36gcovabbmcz4d35vytnpa