Subtopic structuring for full-length document access

Marti A. Hearst, Christian Plaunt
1993 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '93  
We argue that the advent of large volumes of fulllength text, as opposed to short texts like abstracts and newswire, should be accompanied by corresponding new approaches to information access. Toward this end, we discuss the merits of imposing structure on fulllength text documents that is, a partition of the text into coherent m ulti-paragraph units that represent t h e pattern of subtopics that comprise the text. Using this structure, we c a n m a k e a distinction between the main topics,
more » ... ich occur throughout the length of the text, and the subtopics, which are of only limited extent. We discuss why recognition of subtopic structure is important and how, to some degree of accuracy, i t c a n b e found. We describe a new way of specifying queries on full-length documents and then describe an experiment in which making use of the recognition of local structure achieves better results on a typical information retrieval task than does a standard IR measure.
doi:10.1145/160688.160695 dblp:conf/sigir/HearstP93 fatcat:4skenl335jdrfgdvgpousbguta