Evaluation metrics for generation

Srinivas Bangalore, Owen Rambow, Steve Whittaker
2000 Proceedings of the first international conference on Natural language generation - INLG '00  
Certain generation applications may profit from the use of stochastic methods. In developing stochastic methods, it is crucial to be able to quickly assess the relative merits of different approaches or models. In this paper, we present several types of intrinsic (system internal) metrics which we have used for baseline quantitative assessment. This quantitative assessment should then be augmented to a fuller evaluation that examines qualitative aspects. To this end, we describe an experiment
more » ... ibe an experiment that tests correlation between the quantitative metrics and human qualitative judgment. The experiment confirms that intrinsic metrics cannot replace human evaluation, but some correlate significantly with human judgments of quality and understandability and can be used for evaluation during development.
doi:10.3115/1118253.1118255 dblp:conf/inlg/BangaloreRW00 fatcat:32hwzkmc2zdjtdooqvfccuci4u