A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
A Roadmap Towards Distributed Web Assessment
[chapter]
2004
Lecture Notes in Computer Science
The webLyzard project generates empirical Web data by processing large samples of Web sites automatically. It mirrors more than 5,000 international Web sites in monthly intervals and has amassed Web data in excess of one terabyte since 1999. Structural and textual analyses convert the wealth of information contained in the sample into detailed site profiles and aggregated content representations. A distributed approach promises to increase both sample size and the frequency of data gathering.
doi:10.1007/978-3-540-27834-4_22
fatcat:aysgavqwefayxiwexg5ipnl3ye