Max-Sum Diversification, Monotone Submodular Functions and Dynamic Updates [article]

Allan Borodin, Aadhar Jain, Hyun Chul Lee, Yuli Ye
2016 arXiv   pre-print
Result diversification is an important aspect in web-based search, document summarization, facility location, portfolio management and other applications. Given a set of ranked results for a set of objects (e.g. web documents, facilities, etc.) with a distance between any pair, the goal is to select a subset S satisfying the following three criteria: (a) the subset S satisfies some constraint (e.g. bounded cardinality); (b) the subset contains results of high "quality"; and (c) the subset
more » ... ns results that are "diverse" relative to the distance measure. The goal of result diversification is to produce a diversified subset while maintaining high quality as much as possible. We study a broad class of problems where the distances are a metric, where the constraint is given by independence in a matroid, where quality is determined by a monotone submodular function, and diversity is defined as the sum of distances between objects in S. Our problem is a generalization of the max sum diversification problem studied in GoSh09 which in turn is a generaliztion of the max sum p-dispersion problem studied extensively in location theory. It is NP-hard even with the triangle inequality. We propose two simple and natural algorithms: a greedy algorithm for a cardinality constraint and a local search algorithm for an arbitary matroid constraint. We prove that both algorithms achieve constant approximation ratios.
arXiv:1203.6397v3 fatcat:3phfoyvnazeqnbdyaprqbvhjui