Host: Japan Society for Fuzzy Theory and Intelligent Informatics
Co-host: International Fuzzy Systems Association, IEEE Computational Intelligence Society Japan Chapter
Knowledge discovery in databases (KDD) or data mining are the fundamental issues in many application fields and the task consist of two major processes: classification of data and analysis of correlation structure. Recently, several data mining tools, which can be regarded as the hybrid of clustering techniques and multivariate data analysis, have been proposed. In the non-linear approaches, simple linear models are used in conjunction with some suitable clustering algorithms. Fuzzy c-varieties (FCV) is a linear fuzzy clustering technique that captures the local linear structures of data sets, and is often identified with a technique for local principal component analysis because the vectors spanning prototypes form the orthonormal basis of principal subspaces. It is, however, difficult to define the clustering criterion when data sets include not only numerical variables but also nominal variables.In this paper, we propose a clustering technique that performs the FCV clustering of data sets including categorical data. The proposed algorithm iterates quantification of categorical data in the FCV clustering process so that quantified scores suit the FCV clustering. Because the quantified category scores are effectively assigned considering the relationship among categories, they are useful for interpreting the cluster structure.