Comparing Taxonomies for Organising Collections of Documents

Samuel Fernando, Mark M. Hall, Eneko Agirre, Aitor Soroa, Paul D. Clough, Mark Stevenson
2012 International Conference on Computational Linguistics  
There is a demand for taxonomies to organise large collections of documents into categories for browsing and exploration. This paper examines four existing taxonomies that have been manually created, along with two methods for deriving taxonomies automatically from data items. We use these taxonomies to organise items from a large online cultural heritage collection. We then present two human evaluations of the taxonomies. The first measures the cohesion of the taxonomies to determine how well
more » ... hey group together similar items under the same concept node. The second analyses the concept relations in the taxonomies. The results show that the manual taxonomies have high quality well defined relations. However the novel automatic method is found to generate very high cohesion.
dblp:conf/coling/FernandoHASCS12 fatcat:5753cgj6inhh5pnr6btau2l4v4