11 Hits in 8.5 sec

Biased LexRank: Passage retrieval using random walks with question-based priors

Jahna Otterbacher, Gunes Erkan, Dragomir R. Radev
2009 Information Processing & Management  
We present Biased LexRank, a method for semi-supervised passage retrieval in the context of question answering.  ...  We then perform a random walk on the lexical similarity graph in order to recursively retrieve additional passages that are similar to other relevant passages.  ...  0329043, "Probabilistic and Link-based Methods for Exploiting Very Large Textual Repositories".  ... 
doi:10.1016/j.ipm.2008.06.004 fatcat:b5ro27imybesdgdmfdwmq557lq

Document Summarization with Latent Queries

Yumo Xu, Mirella Lapata
2022 Transactions of the Association for Computational Linguistics  
We model queries as discrete latent variables over document tokens, and learn representations compatible with observed and unobserved query verbalizations.  ...  Acknowledgments The authors would like to thank the action editor, Wenjie Li, and the anonymous reviewers for 10 We are grateful to Md Tahmid Rahman Laskar and Haichao Zhu for providing us with system  ...  LEXRANK (Erkan and Radev, 2004) estimates sentence-level centrality via a Markov Random Walk on graphs. The second block includes two additional extractive systems.  ... 
doi:10.1162/tacl_a_00480 fatcat:xcfvrc5mtvdifi72ngu4wpsm4y

Knowledge Base Driven Automatic Text Summarization using Multi-objective Optimization

Chihoon Jung, Wan Chul Yoon, Rituparna Datta, Sukhwan Jung
2021 International Journal of Advanced Computer Science and Applications  
With this formulation, we propose a novel technique to improve the performance using a knowledge base.  ...  The main rationale of the approach is to extract important text features of the original text by detecting important entities in a knowledge base.  ...  The method allows topic, or query, sensitive sentence retrieval with weighted random-walk based on a prior distribution of sentence ranks, performing well on both extractive text summarization and passage  ... 
doi:10.14569/ijacsa.2021.0120895 fatcat:dhx6it637nahzp2r5jp3a2dk44

QMSum: A New Benchmark for Query-based Multi-domain Meeting Summarization [article]

Ming Zhong, Da Yin, Tao Yu, Ahmad Zaidi, Mutethia Mutuma, Rahul Jha, Ahmed Hassan Awadallah, Asli Celikyilmaz, Yang Liu, Xipeng Qiu, Dragomir Radev
2021 arXiv   pre-print
In order to satisfy the needs of different types of users, we define a new query-based multi-domain meeting summarization task, where models have to select and summarize relevant spans of meetings in response  ...  ., PM) to denote speakers, and use the full name like 'Project Manager' instead. Abbreviations. In the raw meeting transcripts, some of abbreviations are along with character '_'.  ...  Since all the meetings happened, we ask annotators to use past tense. How to Denote Speakers. If the gender information is unclear, we would ask annotators not to use 'he/she' to denote speakers.  ... 
arXiv:2104.05938v1 fatcat:24p3u2nimrei7mwnwr32ouk2qq

Text Summarization and Categorization for Scientific and Health-Related Data

Arman Cohan
2019 SIGIR Forum  
I am deeply indebted to my extraordinary wife, Maryam, throughout this journey, for our many wonderful years and always standing beside me, even when more than 6,000 miles were between us.  ...  My research goal in this dissertation is to develop Natural Language Processing (NLP) and Information Retrieval (IR) methods for better processing and I am also grateful to Ophir Frieder for all our discussions  ...  I especially want to thank you for your perspective and helping me pursue and define projects with real impact.  ... 
doi:10.1145/3308774.3308802 fatcat:bubs6xecbzdy3dcnpxtnvxsh44

Becker_columbia_0054D_10406.pdf [article]

results, using term extraction and frequency analysis, with the goal of improving recall.  ...  For this content selection task, we experiment with several centrality-based techniques that consider the similarity of each event-related document to the central theme of its associated event and to other  ...  For the LexRank approach we used the Mead toolkit [ER04] with the LexRank feature option, which produces a ranked list of messages according to their LexRank score.  ... 
doi:10.7916/d8qv4031 fatcat:mkjdoxrbvnc5zgf4nx2hnxfowa

Identification and Characterization of Events in Social Media

Hila Becker
In this scenario, we use an online clustering framework to identify these unknown events and their associated social media documents.  ...  We use these observations to inform our event identification techniques. To identify events in social media, we follow two possible scenarios.  ...  For the LexRank approach we used the Mead toolkit [ER04] with the LexRank feature option, which produces a ranked list of messages according to their LexRank score.  ... 
doi:10.7916/d8vm4qvd fatcat:hex4l6kkw5gc7l2jm735eh5e64

Content Modeling for Automatic Document Summarization [article]

Leonhard Hennig, Technische Universität Berlin, Technische Universität Berlin, Sahin Albayrak
Computational methods that progress beyond today's document-centric information retrieval solutions are therefore essential to help users to cope with the sheer amount of relevant documents and the information  ...  However, the burden of finding the searched-for information within these documents stays with the user.  ...  of LexRank.  ... 
doi:10.14279/depositonce-3039 fatcat:phocc7ykafa3lj7vugj2sbdmo4

Discourse analysis of asynchronous conversations

Shafiq Rayhan Joty
Our graph-based approach extends state-of-the-art methods by integrating a fine-grained conversational structure with other conversational features.  ...  ., conversations where participants communicate with each other at different times (e.g., emails, blogs).  ...  The random walk framework allows us to incorporate knowledge from multiple sources as priors [208] , biases [154] and co-ranking [238] .  ... 
doi:10.14288/1.0165726 fatcat:jixdchdwqzecployo5xgznb534

Summarization of Changes in Dynamic Text Collections

Manika Kar
2013 unpublished
Based on this idea, Wan and Yang [WY08] proposed Cluster-based Conditional Markov Random Walk Model (Cluster CMRW), which is an improvement of the MRW model or the PageRank algorithm [PBMW99] by incorporating  ...  However, K is determined in a setting where a symmetric Dirichlet prior on θ with α = 0.5 and a symmetric Dirichlet prior on φ with β = 0.1 are used.  ... 
doi:10.14236/ewic/fdia2013.4 fatcat:p2r5a4vetbghbclqaovludowam

Workshop Crossing Barriers in Text Summarization Research

Horacio Saggion, Jean-Luc Borovets, Bulgaria, Horacio Saggion, Choy-Kim Chuah, Gustavo Crispino, Donna Harman, Min-Yen Kan, Guy Lapalme, Constantin Orasan, Dragomir Radev, Stan Szpakowicz (+5 others)
2005 unpublished
such as a task-based retrieval study.  ...  Sanderson "Advantages of query biased summarising in Information Retrieval", Proceeding of the 21st Annual ACM SIGIR International Conference on Research and Development in Information Retrieval (SIGIR  ...