Abstract
This paper discusses the application of the fuzzy c-means (FCM) based classifier to large scale data sets. The first type of the large scale data set is the one containing a huge number of samples (patterns). The number can be reduced by sampling, but the accuracy of the classifier on the test set may deteriorate, and the accuracy on the available data worsens. The FCM classifier uses covariance matrices whose size does not increase with the number of training samples, and the training time is proportional to the number of samples. Comparing with the support vector machine (SVM) classifier, which is known as one of the highest performance classifiers, the paper shows that the FCM classifier nearly attains the accuracy of SVM and surpasses it in the training time and the testing time. If the feature dimension of the samples is relatively small or the dimension can be reduced by principal component analysis (PCA), the training of the FCM classifier converges in a short period of time. But, if the feature dimension is large enough, the covariance matrices can't be stored in the computer memory and the computation is infeasible. So, the paper proposes a modified algorithm to cope with high dimensional feature data. As an example, a subset of COREL image database is used to compare the performance with the approach using PCA data set compression.