IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532
Regular Section
Imbalanced Data Over-Sampling Method Based on ISODATA Clustering
Zhenzhe LVQicheng LIU
著者情報
ジャーナル フリー

2023 年 E106.D 巻 9 号 p. 1528-1536

詳細
抄録

Class imbalance is one of the challenges faced in the field of machine learning. It is difficult for traditional classifiers to predict the minority class data. If the imbalanced data is not processed, the effect of the classifier will be greatly reduced. Aiming at the problem that the traditional classifier tends to the majority class data and ignores the minority class data, imbalanced data over-sampling method based on iterative self-organizing data analysis technique algorithm(ISODATA) clustering is proposed. The minority class is divided into different sub-clusters by ISODATA, and each sub-cluster is over-sampled according to the sampling ratio, so that the sampled minority class data also conforms to the imbalance of the original minority class data. The new imbalanced data composed of new minority class data and majority class data is classified by SVM and Random Forest classifier. Experiments on 12 datasets from the KEEL datasets show that the method has better G-means and F-value, improving the classification accuracy.

著者関連情報
© 2023 The Institute of Electronics, Information and Communication Engineers
前の記事 次の記事
feedback
Top