Abstract
Document map construction is a useful approach to intuitive text mining, in which mutual relations among text documents composed of many keywords are characterized in a 2-D map. Usually, text documents are first preprocessed into numerical weights such as tf-idf weights by considering term frequency and inverse document frequency, and then, dimension reduction techniques, such as principal component analysis (PCA), are performed for constructing low dimensional plots of multivariate data. This paper considers using a linear fuzzy clustering-based variable selection mechanism for selecting keywords that are useful for characterizing documents, in conjunction with applying document clustering for extracting multiple
linear sub-structures. In the approach, meaningful keywords are selected in each cluster (linear sub-structure) and mutual relations among documents are represented in simple linear sub-spaces.