Multi-document Summarization by Information Distance

Chong Long, Minlie Huang, Xiaoyan Zhu, Ming Li
2009 2009 Ninth IEEE International Conference on Data Mining  
We are now living in a world where information is growing and updating quickly. Knowledge can be acquired more efficiently with the help of automatic document summarization and updating techniques. This paper describes a novel approach for multi-document update summarization. The best summary is defined as one of which has the minimal information distance to the entire document set. And the best update summary has the minimal conditional information distance to a document cluster given that a
more » ... ster given that a prior document cluster has already been read. We propose two methods to approximate information distance between two documents, one by compression and the other by the coding theory. Experiments on the DUC 2007 dataset 1 and the TAC 2008 dataset 2 have proved that our method closely correlates with the human-written summaries and outperforms LexRank in many categories under the ROUGE evaluation criterion.
doi:10.1109/icdm.2009.107 dblp:conf/icdm/LongHZL09 fatcat:5jfcleuf45exln5msroxb3xesu