Evaluation challenges in large-scale document summarization

Dragomir R. Radev, Simone Teufel, Horacio Saggion, Wai Lam, John Blitzer, Hong Qi, Arda Çelebi, Danyu Liu, Elliott Drabek
2003 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - ACL '03  
We present a large-scale meta evaluation of eight evaluation measures for both single-document and multi-document summarizers. To this end we built a corpus consisting of (a) 100 Million automatic summaries using six summarizers and baselines at ten summary lengths in both English and Chinese, (b) more than 10,000 manual abstracts and extracts, and (c) 200 Million automatic document and summary retrievals using 20 queries. We present both qualitative and quantitative results showing the
more » ... showing the strengths and drawbacks of all evaluation methods and how they rank the different summarizers.
doi:10.3115/1075096.1075144 dblp:conf/acl/RadevTSLBQCLD03 fatcat:3hssojx25jhwlgr52u7b36aiiy