Document Retrieval: Expertise in Identifying Relevant Documents

Philip J. Smith
1990 IEEE Data Engineering Bulletin  
Data Engineering Bulletin is a quarterly publication of the IEEE Computer Society Technical Committee on Data Engineering . Its scope of interest includes: data structures and models, access strategies, access controi techniques, database architecture, database machines, intelligent front ends, mass storage for very large databases, distributed database systems and techniques, database software design and Implementation, database utilities, database security and related areas. Contribution to
more » ... e Bulletin Is hereby solicited. News items, letters, technical papers, book reviews, meeting previews. summaries, case studies, etc., should be sent to the Editor. All letters to the Editor will be considered for pubiication unless accompanied by a request to the contrary. Technical Letter from the Editor Document retrieval deals with the capture, storage, and retrieval of natural language texts, which could range from short bibliographic records to full text documents. Document retrieval has been investigated for over three decades, but its application has thus far been limited to library systems. The proliferation of PCs, workstations, online databases, and hypertext systems has presented new challenges and opportunities to this research area. Researches in this area not only are of interest to large-scale systems such as library systems and news databases but have profound impacts on the way we manage our personal, day-to-day, daa The special issue has assembled eight papers examining various aspects of this important topic. The first paper, by Salton, describes the SMART system, which is perhaps one of the most thoroughly studied document retrieval system so far, and discusses the potential of knowledge bases in document retrieval. He then describes a simple term weight strategy for the analysis of local document structures. Smith's paper discusses the expertise required for an effective search and describes a knowledge-based system, called EP-X, which can help the users to refine their queries. Croft gives an overview of the research being conducted in his research group at the University of Massachusetts, covering a wide range of research from text representation, to retrieval model, to user modeling and interface. The main concern of the research is the effectiveness of the retheval. The next paper, by Faloutsos, addresses the other end of the search problemhow to efficiently search a large number of documents. The paper is focused on one particular text access technique, namely, the signature file. Variants of the signature file technique are presented and analyzed. Along the same line, Stanfihl describes a parallel retrieval system based on the signature file. The system runs on a Connection Machine and implements a simple document ranking and relevance feedback strategy. He provides justifications for the use of large-scale parallel systems for document retrieval. Hollaar discusses his experience in the design and development of the partitioned finite state automaton (PFSA). He describes a prototype based on the PFSA concept and discusses the needs and potentials of special-purpose pattern matchers in light of the rapidly lowering costs of general-purpose processors. McGill and Dillon describe several major projects being conducted in OCLC. The projects include research prototypes as well as field experiments. One of the concerns in their research is the conversion of paper documents to an electronic form and to provide real services to a large user community. Last but not least, Lee and Woelk describe their work in integrating text management capability in the object-oriented database ORION developed at MCC. They describe the class hierarchy for organizing textual objects and the search capability of the system. I would like to thank the authors for accepting my invitation to contribute to this special issue. Many of them have to make time from their busy schedules in order to meet our deadline. The suggestions from Dr. Won Kim, the Editor-in-Chief, were crucial in making my task as enjoyable as it was. I hope this special issue will bring this important subject to a wider audience and you will find the articles stimulating and interesting.
dblp:journals/debu/Smith90 fatcat:dlyur6m4wjdylanby4cy23jouu