A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Using high performance systems to build collections for a digital library
Proceedings. International Conference on Parallel Processing Workshop
Nothing is more distributed than the Web, with its content spread across thousands of servers. High performance hardware and software is essential for an effective download, analysis, and organization of this content. We describe our experience with a highly parallel Web crawling system (Mercator) to construct -automatically -collections of scientific resources for the National Science Digital Library.
doi:10.1109/icppw.2002.1039762
dblp:conf/icppw/Bergmark02
fatcat:37bvsh6w2rbwzgz2exldjyy74m