2023 Volume E106.D Issue 9 Pages 1528-1536
Class imbalance is one of the challenges faced in the field of machine learning. It is difficult for traditional classifiers to predict the minority class data. If the imbalanced data is not processed, the effect of the classifier will be greatly reduced. Aiming at the problem that the traditional classifier tends to the majority class data and ignores the minority class data, imbalanced data over-sampling method based on iterative self-organizing data analysis technique algorithm(ISODATA) clustering is proposed. The minority class is divided into different sub-clusters by ISODATA, and each sub-cluster is over-sampled according to the sampling ratio, so that the sampled minority class data also conforms to the imbalance of the original minority class data. The new imbalanced data composed of new minority class data and majority class data is classified by SVM and Random Forest classifier. Experiments on 12 datasets from the KEEL datasets show that the method has better G-means and F-value, improving the classification accuracy.