Revisiting the cluster-based paradigm for implicit search result diversification

Hai-Tao Yu, Adam Jatowt, Roi Blanco, Hideo Joho, Joemon M. Jose, Long Chen, Fajie Yuan
<span title="">2018</span> <i title="Elsevier BV"> <a target="_blank" rel="noopener" href="" style="color: black;">Information Processing &amp; Management</a> </i> &nbsp;
2018) Revisiting the cluster-based paradigm for implicit search result diversification. Abstract To cope with ambiguous and/or underspecified queries, search result diversification (SRD) is a key technique that has attracted a lot of attention. This paper focuses on implicit SRD, where the subtopics underlying a query are unknown. Many existing methods appeal to the greedy strategy for generating diversified results. A common practice is using a heuristic criterion for making the locally
more &raquo; ... choice at each round. As a result, it is difficult to know whether the failures are caused by the optimization criterion or the setting of parameters. Different from previous studies, we formulate implicit SRD as a process of selecting and ranking k exemplar documents through integer linear programming (ILP). The key idea is that: for a specific query, we expect to maximize the overall relevance of the k exemplar documents. Meanwhile, we wish to maximize the representativeness of the selected exemplar documents with respect to the nonselected documents. Intuitively, if the selected exemplar documents concisely represent the entire set of documents, the novelty and diversity will naturally arise. Moreover, we propose two approaches ILP4ID (Integer Linear Programming for Implicit SRD) and AP4ID (Affinity Propagation for Implicit SRD) for solving the proposed formulation of implicit SRD. In particular, ILP4ID appeals to the strategy of bound-and-branch and is able to obtain the optimal solution. AP4ID being an approximate method transforms the target problem as a maximum-a-posteriori inference problem, and the message passing algorithm is adopted to find the solution. Furthermore, we investigate the differences and connections between the proposed models and prior models by casting them as * Corresponding author different variants of the cluster-based paradigm for implicit SRD. To validate the effectiveness and efficiency of the proposed approaches, we conduct a series of experiments on four benchmark TREC diversity collections. The experimental results demonstrate that: (1) The proposed methods, especially ILP4ID, can achieve substantially improved performance over the state-of-the-art unsupervised methods for implicit SRD. (2) The initial runs, the number of input documents, query types, the ways of computing document similarity, the predefined cluster number and the optimization algorithm significantly affect the performance of diversification models. Careful examinations of these factors are highly recommended in the development of implicit SRD methods. Based on the in-depth study of different types of methods for implicit SRD, we provide additional insight into the cluster-based paradigm for implicit SRD. In particular, how the methods relying on greedy strategies impact the performance of implicit SRD, and how a particular diversification model should be fine-tuned.
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="">doi:10.1016/j.ipm.2018.03.003</a> <a target="_blank" rel="external noopener" href="">fatcat:hq5ln4f3kfhrvjr735quvvrnnq</a> </span>
<a target="_blank" rel="noopener" href="" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href=""> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> </button> </a>