2005 Volume 18 Issue 9 Pages 322-330
Knowledge discovery in databases (KDD) or data mining involves fitting models to or determining patterns from high dimensional data sets, and extraction of correlation rules plays an important role. This paper proposes a new approach to knowledge discovery with linear model estimation, in which principal component analysis (PCA) is performed by selecting variables. The proposed algorithm is a hybrid of fuzzy clustering and PCA based on lower rank approximation of data matrix, in which the relative responsibilities of the variables are estimated by using possibilistic constraint for memberships. The proposed algorithm is also enhanced to a local PCA model that can be used for data mining by performing both of linear model estimation and stratified sampling.