Parallel computing in information retrieval – an updated review

A. Macfarlane, S.E. Robertson, J.A. Mccann
1997 Journal of Documentation  
This is the accepted version of the paper. This version of the publication may differ from the final published version. Permanent repository link: http://openaccess.city.ac.uk/4463/ Link to published version: http://dx.The progress of parallel computing in Information Retrieval (IR) is reviewed. In particular we stress the importance of the motivation in using parallel computing for Text Retrieval. We analyse parallel IR systems using a classification due to Rasmussen [1] and describe some
more » ... lel IR systems. We give a description of the retrieval models used in parallel Information Processing.. We describe areas of research which we believe are needed. Network FIGURE 1 -Types of memory organisation examples example of a machine with the MIMD class architecture is the Fujitsu AP1000 which is described in section 2.2 below. It should be noted that a further class of architecture exists which does not fit well into Flynn's classification. Special-Purpose Hardware has been built to accommodate IR systems [8] including associative memories, finite state machines and cellular arrays [9] . Some of this work has been in building special purpose parallel architectures [10] for text retrieval and we include it in the review for completeness Parallel architectures used in IR We now turn to specific machine architectures which have been used for parallel IR systems. We give an example of each type of architecture from section 2.1; the DAP, Fujitsi AP1000, and special parallel hardware. We also discuss the growing impact of networked workstation technology. More information on various architectures can be found in Rasmussen [1]. A. DAP (Distributed Array of Processors). The AMT (formally ICL) DAP is a SIMD class architecture. The DAP [7] organisation is an array of 1-bit processing elements (PEs) arranged in a 32 by 32 matrix for the 500 series and 64 by 64 for the 600 series; 1024 and 4096 PE's in total respectively. The 600 series has four times the memory and processing power of the 500 series. Each processor is connected to its north, south, east and west neighbour processors (known as a NEWS grid) and to the row and column of the matrix by a bus system. Each processor has at least 32 Kbits of its own local memory. The ICL DAP needed a mainframe as a front end, but workstations can be used for current varieties. The architecture has a Master Control Unit (MCU) which broadcasts instructions and data to the array to work on and also obtains the results from the array. The DAP has very fast I/O capabilities of up to 50 Mbytes per second to overcome the I/O bottleneck (the I/O problem in parallel computing for IR is discussed in section 2.3 below). The DAP is successfully used by the DapText system described by Reddaway [11] and is included in the case studies section (7.1) below. Reuters use this system for their Text Retrieval purposes. DapText has been implemented on both the 500 and 600 series of the DAP. Other work includes a British Library project for using the DAP in IR, described in [12] [13] [14] [15] .
doi:10.1108/eum0000000007201 fatcat:2zuwtehixbd6xk33hwb3j43nse