User-assisted similarity estimation for searching related web pages

Lin Li, Zhenglu Yang, Kulwadee Somboonviwat, Masaru Kitsuregawa
2007 Proceedings of the 18th conference on Hypertext and hypermedia - HT '07  
To utilize the similarity information hidden in the Web graph, we investigate the problem of adaptively retrieving related Web pages with user assistance. Given a definition of similarities between pages, it is intuitive to estimate that any similarity will propagate from page to page, inducing an implicit topical relatedness between pages. In this paper, we extract connected subgraphs from the whole graph that consists of all pairs of pages whose similarity scores are above a given threshold,
more » ... nd then sort the candidates of related pages by a novel rank measure which is based on the combination distances of a flexible hierarchical clustering. Moreover, due to the subjectivity of similarity values, we dynamically supply the ordering list of related pages according to a parameter adjusted by users. We show our approach effectively handles a set of pages originating from three related categories of Web hierarchies, such as Google Directory. The experiments with three similarity measures demonstrate that using in-link information is favorable while using a combination measure of in-links and out-links lowers the precision of identifying similar pages.
doi:10.1145/1286240.1286245 dblp:conf/ht/LiYSK07 fatcat:hh4rhuwtzzfknnyhn3ozcnz53q