A Roadmap Towards Distributed Web Assessment [chapter]

Arno Scharl
2004 Lecture Notes in Computer Science  
The webLyzard project generates empirical Web data by processing large samples of Web sites automatically. It mirrors more than 5,000 international Web sites in monthly intervals and has amassed Web data in excess of one terabyte since 1999. Structural and textual analyses convert the wealth of information contained in the sample into detailed site profiles and aggregated content representations. A distributed approach promises to increase both sample size and the frequency of data gathering.
more » ... is paper presents a roadmap towards distributed Web assessment, extending and revising the current system architecture to enhance its scalability and flexibility for investigating the dynamics of electronic content.
doi:10.1007/978-3-540-27834-4_22 fatcat:aysgavqwefayxiwexg5ipnl3ye