Salience Estimation and Faithful Generation: Modeling Methods for Text Summarization and Generation

Christopher Kedzie
This thesis is focused on a particular text-to-text generation problem, automatic summarization, where the goal is to map a large input text to a much shorter summary text. The research presented aims to both understand and tame existing machine learning models, hopefully paving the way for more reliable text-to-text generation algorithms. Somewhat against the prevailing trends, we eschew end-to-end training of an abstractive summarization model, and instead break down the text summarization
more » ... blem into its constituent tasks. At a high level, we divide these tasks into two categories: content selection, or "what to say" and content realization, or "how to say it" (McKeown, 1985). Within these categories we propose models and learning algorithms for the problems of salience estimation and faithful generation. Salience estimation, that is, determining the importance of a piece of text relative to some context, falls into a problem of the former category, determining what should be selected for a summary. In particular, we experiment with a variety of popular or novel deep learning models for salience estimation in a single document summarization setting, and design several ablation experiments to gain some insight into which input signals are most important for making predictions. Understanding these signals is critical for designing reliable summarization models. We then consider a more difficult problem of estimating salience in a large document stream, and propose two alternative approaches using classical machine learning techniques from both unsupervised clustering and structured prediction. These models incorporate salience estimates into larger text extraction algorithms that also consider redundancy and previous extraction decisions. Overall, we find that when simple, position based heuristics are available, as in single document news or research summarization, deep learning models of salience often exploit them to make predictions, while ignoring the arguably more important content features of the input. [...]
doi:10.7916/d8-61n8-mg23 fatcat:sztaye4ftvhubjibjd3ty5dqy4