Flexible intrinsic evaluation of hierarchical clustering for TDT

James Allan, Ao Feng, Alvaro Bolivar
2003 Proceedings of the twelfth international conference on Information and knowledge management - CIKM '03  
The Topic Detection and Tracking (TDT) evaluation program has included a "cluster detection" task since its inception in 1996. Systems were required to process a stream of broadcast news stories and partition them into non-overlapping clusters. A system's effectiveness was measured by comparing the generated clusters to "truth" clusters created by human annotators. Starting in 2003, TDT is moving to a more realistic model that permits overlapping clusters (stories may be on more than one topic)
more » ... and encourages the creation of a hierarchy to structure the relationships between clusters (topics). We explore a range of possible evaluation models for this modified TDT clustering task to understand the best approach for mapping between the humangenerated "truth" clusters and a much richer hierarchical structure. We demonstrate that some obvious evaluation techniques fail for degenerate cases. For a few others we attempt to develop an intuitive sense of what the evaluation numbers mean. We settle on some approaches that incorporate a strong balance between cluster errors (misses and false alarms) and the distance it takes to travel between stories within the hierarchy.
doi:10.1145/956912.956914 fatcat:lg3f3etplnevvhjhvd6nyf4dfu