単語の共起グラフを用いた潜在的意味に基づく効果的な文書分類への取り組み

小倉 由佳里; 小林 一郎

doi:10.11517/jsaisigtwo.2013.AM-03_06

Abstract

In this paper, we propose a method to raise the accuracy of text classification based on latent topics, reconsidering the techniques necessary for good classification - for example, to decide important sentences in a document, the sentences with important words are usually regarded as important sentences. In this case, tf.idf is often used to decide important words. On the other hand, we apply the PageRank algorithm to rank important words in each document. Furthermore, before clustering documents, we refine the target documents by representing them as a collection of important sentences in each document. We then classify the documents based on latent information in the documents. As a clustering method, we employ the k-means algorithm and investigate how our proposed method works for good clustering.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!