Information Retrieval in Digital Libraries: Bringing Search to the Net

B. R. Schatz
1997 Science  
A digital library enables users to interact effectively with information distributed across a network. These network information systems support search and display of items from organized collections. In the historical evolution of digital libraries, the mechanisms for retrieval of scientific literature have been particularly important. Grand visions in 1960 led first to the development of text search, from bibliographic databases to full-text retrieval. Next, research prototypes catalyzed the
more » ... ise of document search, from multimedia browsing across local-area networks to distributed search on the Internet. By 2010, the visions will be realized, with concept search enabling semantic retrieval across large collections. Immediate access to all scientific literature has long been a dream of scientists. The network information systems needed to support such access have steadily improved as the underlying computing and communications infrastructure has improved. The recent advent of World Wide Web searchers and digital libraries has rekindled popular interest in these issues. However, the problems and components have remained relatively unchanged since the early days of information retrieval. Thus, understanding the evolution of network search technology will place these systems in their proper historical context and aid in understanding their future. Organized collections of scientific materials are traditionally called "libraries," and the searchable online versions of these are called "digital libraries" (1). The primary purpose of digital libraries is to enable searching of electronic collections distributed across networks, rather than merely creating electronic repositories from digitized physical materials. Traditionally, information retrieval has been a task for professional librarians. Trained reference librarians interact with online services of specialized materials and report results to querying scientists. Although public computer networks have long been used to access specialized information services, it has taken the recent rise of the Internet to make literature searching directly available to widespread groups of scientists. Since the beginnings of online information retrieval more than 30 years ago, the base functionality has remained essentially unchanged. A collection of literature is maintained and indexed, which the user accesses by means of a terminal connected to a server across a network. The user specifies a query by a set of words, and all documents in the collection that contain those words are returned. The fundamental technology for searching large collections is finally changing, so that information retrieval in the next century will be far more semantic than syntactic, searching concepts rather than words (Fig.
doi:10.1126/science.275.5298.327 pmid:8994022 fatcat:tdnp7pgtmrfedkc6cbip7ickim