電気学会論文誌C(電子・情報・システム部門誌)
Online ISSN : 1348-8155
Print ISSN : 0385-4221
ISSN-L : 0385-4221
<音声画像処理・認識>
統計的分類手法による英文新聞記事のテキスト自動分類
祖 国威大山 航若林 哲史木村 文隆
著者情報
ジャーナル フリー

2004 年 124 巻 3 号 p. 852-860

詳細
抄録

The basic process of automatic text classification is learning a classification scheme from training examples then using it to classify unseen textual documents. It is essentially the same as graphic or character pattern recognition process. So the pattern recognition approaches can be used for automatic text categorization. In this research several statistical classification techniques each of which employs Euclidean distance, various similarity measures, linear discriminant function, projection distance, modified projection distance, SVM, nearest-neighbor, have been used for automatic text classification. The principal component analysis was used to reduce the dimensionality of the feature vector. Comparative experiments have been conducted on the Reuters-21578 test collection of English newswire articles. The results illustrate that the efficiency of modified projection distance is totally better than the other methods and the principal component analysis is suitable for reducing the dimensionality of the text features.

著者関連情報
© 電気学会 2004
前の記事 次の記事
feedback
Top