An Overview of Web Data Clustering Practices [chapter]

Athena Vakali, Jaroslav Pokorný, Theodore Dalamagas
2004 Lecture Notes in Computer Science  
Clustering is a challenging topic in the area of Web data management. Various forms of clustering are required in a wide range of applications, including finding mirrored Web pages, detecting copyright violations, and reporting search results in a structured way. Clustering can either be performed once offline, (independently to search queries), or online (on the results of search queries). Important efforts have focused on mining Web access logs and to cluster search engine results on the fly.
more » ... Online methods based on link structure and text have been applied successfully to finding pages on related topics. This paper presents an overview of the most popular methodologies and implementations in terms of clustering either Web users or Web sources and presents a survey about current status and future trends in clustering employed over the Web.
doi:10.1007/978-3-540-30192-9_59 fatcat:cil7pgmogfdcbehyihgsihqx4a