Clustering Ontology-enriched Graph Representation for Biomedical Documents based on Scale-Free Network Theory

Illhoi Yoo, Xiaohua Hu
2006 2006 3rd International IEEE Conference Intelligent Systems  
Abstract⎯ In this paper we introduce a novel document clustering approach that solves some major problems of traditional document clustering approaches. Instead of depending on traditional vector space model, this approach represents documents as graphs using domain knowledge in ontology because graphs can represent the semantic relationships among the concepts in documents. Based on scale-free network theory, our approach generates a model for each document cluster from the ontology-enriched
more » ... aph representation by identifying k high density subgraphs capturing the core semantic relationship information about each document cluster. Using these k high density subgraphs, each document is assigned to a proper document cluster. Our extensive experimental results on MEDLINE articles show that our approach outperforms two leading document clustering algorithms, BiSecting K-means and CLUTO's vcluster. Moreover, our approach provides a meaningful explanation for document clustering through generated models. This explanation helps users to understand clustering results and documents as a whole.
doi:10.1109/is.2006.348532 fatcat:jfe3vidtknhdzbvkyntr3smbme