Hybrid Summarization with Semantic Weighting Reward and Latent Structure Detector

Mingyang Song, Liping Jing, Yi Feng, Zhiwei Sun, Lin Xiao
2021 Asian Conference on Machine Learning  
Text summarization has been a significant challenge in the Nature Process Language (NLP) field. The approach of dealing with text summarization can be roughly divided into two main paradigms: extractive and abstractive manner. The former allows capturing the most representative snippets in a document while the latter generates a summary by understanding the latent meaning in a material with a language generation model. Recently, studies found that jointly employing the extractive and
more » ... summarization models can take advantage of their complementary advantages, creating both concise and informative summaries. However, the reinforced summarization models mainly depend on the ROUGE-based reward, which only has the ability to quantify the extent of wordmatching rather than semantic-matching between document and summary. Meanwhile, documents are usually collected with redundant or noisy information due to the existence of repeated or irrelevant information in real-world applications. Therefore, only depending on ROUGE-based reward to optimize the reinforced summarization models may lead to biased summary generation. In this paper, we propose a novel deep Hybrid Summarization with semantic weighting Reward and latent structure Detector (HySRD). Specifically, HySRD introduces a new reward mechanism that simultaneously takes advantage of semantic and syntactic information among documents and summaries. To effectively model the accuracy semantics, a latent structure detector is designed to incorporate the high-level latent structures in the sentence representation for information selection. Extensive experiments have been conducted on two well-known benchmark datasets CNN/Daily Mail (short input document) and BigPatent (long input document). The automatic evaluation shows that our approach significantly outperforms the state-of-the-art of hybrid summarization models.
dblp:conf/acml/SongJFSX21 fatcat:xxzu4pum5vfxdhjvs6ub27o7rm