Host: Japan Society for Fuzzy Theory and Intelligent Informatics (SOFT)
Text document classification is a fundamental technique for text analysis such as e-mail filtering and patent retrieval tasks. In this research, fuzzy PCA-based robust k-Means is applied to extraction of document clusters so that each cluster core includes mutually related documents ignoring the effect of noise documents. After quantification of documents by calculating tf-idf weights of frequently used words, fuzzy PCA is performed for constructing connectivity matrix composed of connectivity degrees among documents, and then, cluster structures are intuitively recognized by re-ordering the documents considering the responsibility of documents (degree of non-noise level).