Multi-document summarization via submodularity

Jingxuan Li, Lei Li, Tao Li
2012 Applied intelligence (Boston)  
Multi-document summarization is becoming an important issue in the Information Retrieval community. It aims to distill the most important information from a set of documents to generate a compressed summary. Given a set of documents as input, most of existing multi-document summarization approaches utilize different sentence selection techniques to extract a set of sentences from the document set as the summary. The submodularity hidden in the term coverage and the textual-unit similarity
more » ... tes us to incorporate this property into our solution to multidocument summarization tasks. In this paper, we propose a new principled and versatile framework for different multidocument summarization tasks using submodular functions (Nemhauser et al. in Math. Prog. 14(1): 1978) based on the term coverage and the textual-unit similarity which can be efficiently optimized through the improved greedy algorithm. We show that four known summarization tasks, including generic, query-focused, update, and comparative summarization, can be modeled as different variations derived from the proposed framework. Experiments on benchmark summarization data sets (e.g., DUC04-06, TAC08, TDT2 corpora) are conducted to demonstrate the efficacy and effectiveness of our proposed framework for the general multi-document summarization tasks.
doi:10.1007/s10489-012-0336-1 fatcat:2rc3wuxvrreeffjp5sxl5l426e