A Study on Efficient Text Classification Based on Latent Semantic Used a Graph of Co-occurring Terms
単語の共起グラフを用いた潜在的意味に基づく効果的な文書分類への取り組み

Yukari OGURA, Ichiro KOBAYASHI
JSAI Technical Report, Type 2 SIG  
In this paper, we propose a method to raise the accuracy of text classification based on latent topics, reconsidering the techniques necessary for good classification -for example, to decide important sentences in a document, the sentences with important words are usually regarded as important sentences. In this case, tf.idf is often used to decide important words. On the other hand, we apply the PageRank algorithm to rank important words in each document. Furthermore, before clustering
more » ... s, we refine the target documents by representing them as a collection of important sentences in each document. We then classify the documents based on latent information in the documents. As a clustering method, we employ the k-means algorithm and investigate how our proposed method works for good clustering.
doi:10.11517/jsaisigtwo.2013.am-03_06 fatcat:tosoi5i4rzhhhhrxeetpnitmye