ファジィ主成分分析に基づくロバストk-Meansによるテキスト文書の分類

本多 克宏; 松井 智宏; 野津 亮; 市橋 秀友

doi:10.14864/fss.25.0.147.0

Abstract

Text document classification is a fundamental technique for text analysis such as e-mail filtering and patent retrieval tasks. In this research, fuzzy PCA-based robust k-Means is applied to extraction of document clusters so that each cluster core includes mutually related documents ignoring the effect of noise documents. After quantification of documents by calculating tf-idf weights of frequently used words, fuzzy PCA is performed for constructing connectivity matrix composed of connectivity degrees among documents, and then, cluster structures are intuitively recognized by re-ordering the documents considering the responsibility of documents (degree of non-noise level).

Content from these authors

Favorites & Alerts

Add to favorites
Additional info alert
Citation alert
Authentication alert

Corresponding author

Register with J-STAGE for free!