Empirical comparison of graph classification algorithms

Nikhil S. Ketkar, Lawrence B. Holder, Diane J. Cook
2009 2009 IEEE Symposium on Computational Intelligence and Data Mining  
The graph classification problem is learning to classify separate, individual graphs in a graph database into two or more categories. A number of algorithms have been introduced for the graph classification problem. We present an empirical comparison of the major approaches for graph classification introduced in literature, namely, SubdueCL, frequent subgraph mining in conjunction with SVMs, walkbased graph kernel, frequent subgraph mining in conjunction with AdaBoost and DT-CLGBI. Experiments
more » ... CLGBI. Experiments are performed on five real world data sets from the Mutagenesis and Predictive Toxicology domain which are considered benchmark data sets for the graph classification problem. Additionally, experiments are performed on a corpus of artificial data sets constructed to investigate the performance of the algorithms across a variety of parameters of interest. Our conclusions are as follows. In datasets where the underlying concept has a high average degree, walk-based graph kernels perform poorly as compared to other approaches. The hypothesis space of the kernel is walks and it is insufficient at capturing concepts involving significant structure. In datasets where the underlying concept is disconnected, SubdueCL performs poorly as compared to other approaches. The hypothesis space of SubdueCL is connected graphs and it is insufficient at capturing concepts which consist of a disconnected graph. FSG+SVM, FSG+AdaBoost, DT-CLGBI have comparable performance in most cases.
doi:10.1109/cidm.2009.4938658 dblp:conf/cidm/KetkarHC09 fatcat:n7jrryldgzd7hakycnvjtb3jce