Contextualization using hyperlinks and internal hierarchical structure of Wikipedia documents

Muhammad Ali Norozi, Paavo Arvola, Arjen P. de Vries
2012 Proceedings of the 21st ACM international conference on Information and knowledge management - CIKM '12  
Context surrounding hyperlinked semi-structured documents, externally in the form of citations and internally in the form of hierarchical structure, contains a wealth of useful but implicit evidence about a document's relevance. These rich sources of information should be exploited as contextual evidence. This paper proposes various methods of accumulating evidence from the context, and measures the effect of contextual evidence on retrieval effectiveness for document and focused retrieval of
more » ... perlinked semi-structured documents. We propose a re-weighting model to contextualize (a) evidence from citations in a query-independent and querydependent fashion (based on Markovian random walks) and (b) evidence accumulated from the internal tree structure of documents. The in-links and out-links of a node in the citation graph are used as external context, while the internal document structure provides internal, within-document context. We hypothesize that documents in a good context (having strong contextual evidence) should be good candidates to be relevant to the posed query, and vice versa. We tested several variants of contextualization and verified notable improvements in comparison with the baseline system and gold standards in the retrieval of full documents and focused elements.
doi:10.1145/2396761.2396855 dblp:conf/cikm/NoroziAV12 fatcat:gmpjpuxrm5b5hljmsgdp6sf7di