Distributed IR for Digital Libraries [chapter]

Ray R. Larson
2003 Lecture Notes in Computer Science  
This paper examines technology developed to support largescale distributed digital libraries. We describe the method used for harvesting collection information using standard information retrieval protocols and how this information is used in collection ranking and retrieval. The system that we have developed takes a probabilistic approach to distributed information retrieval using a Logistic regression algorithm for estimation of distributed collection relevance and fusion techniques to
more » ... multiple sources of evidence. We discuss the harvesting method used and how it can be employed in building collection representatives using features of the Z39.50 protocol. The extracted collection representatives are ranked using a fusion of probabilistic retrieval methods. The effectiveness of our algorithm is compared to other distributed search methods using test collections developed for distributed search evaluation. We also describe how this system in currently being applied to operational systems in the U.K.
doi:10.1007/978-3-540-45175-4_44 fatcat:g2c7c6p4era37hk26d3hkhsioq