Using String Kernel for Document Clustering

Qingwei Shi, Xiaodong Qiao, Guangquan Xu
2010 International Journal of Information Technology and Computer Science  
In this paper, we present a string kernel based method for documents clustering. Documents are viewed as sequences of strings, and documents similarity is calculated by the kernel function. According to the documents similarity, spectral clustering algorithm is used to group documents. Experimental results shows that string kernel method outperform the standard k-means algorithm on the Reuters-21578 dataset.
doi:10.5815/ijitcs.2010.02.06 fatcat:ptj7ncvnrrdy7az3qu6aiqpday