Host: Japan Society for Fuzzy Theory and Intelligent Informatics (SOFT)
One of useful approach to intuitive text mining is construction of text maps that characterize mutual relation among text documents composed of many keywords. In the text mining tasks, text documents are first preprocessed into numerical weights such as tf-idf weights by considering term frequency and inverse document frequency. Principal component analysis (PCA) is used for constructing low dimensional plots of multivariate data. It is, however, often difficult to extract meaningful features from a low dimensional text map when "interesting structure" is concealed by many nonsignificant keywords. This paper proposes to use a liner fuzzy clustering-based variable selection mechanism for selecting keywords that are useful for characterizing text documents. In the variable selection model, the absolute typicality of keywords is estimated based on a graded possibilistic approach. An experimental result with a famous Japanese novel "Kokoro" by Soseki Natsume demonstrates the characteristics of the new approach.