IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532

この記事には本公開記事があります。本公開記事を参照してください。
引用する場合も本公開記事を引用してください。

A Boosting Method Based on Center-of-gravity Oversampling and Pruning for Classifying Imbalanced Data
Fengqi GUOQicheng LIU
著者情報
ジャーナル フリー 早期公開

論文ID: 2024EDP7147

この記事には本公開記事があります。
詳細
抄録

Data imbalance frequently occurs across multiple sectors, including healthcare, security, and finance. It substantially increases the difficulty of classification. To tackle the issue of existing techniques for binary imbalanced data classification easily changing data distribution and to enhance the performance of the classifier, this paper introduces a boosting method named GAPBoost based on center-of-gravity oversampling and pruning for classifying imbalanced data in binary classification scenarios. The algorithm first clusters all instances of the minority class into k distinct clusters by utilizing K-means clustering. Then, it performs center-of-gravity oversampling on the clusters with enough instances to constitute a triangle and generates new instances using the interpolation method on the clusters containing only two minority class instances. Subsequently, pruning is employed to eliminate noisy data from both the majority and minority classes, followed by the AdaBoost algorithm to improve the performance of the classifier on the denoised training set. Ten-fold stratified cross-validation experiments of the GAPBoost algorithm and several other classic ensemble algorithms are performed on 20 benchmark unbalanced datasets using AUC, F1, and G-mean as performance evaluation criteria. The results of the experiments indicate that the GAPBoost algorithm introduced in this paper can effectively handle the classification problem of binary imbalanced datasets and outperform other ensemble algorithms in three evaluation metrics.

著者関連情報
© 2024 The Institute of Electronics, Information and Communication Engineers
feedback
Top