IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532
Regular Section
A Boosting Method Based on Center-of-Gravity Oversampling and Pruning for Classifying Imbalanced Data
Fengqi GUOQicheng LIU
Author information
JOURNAL FREE ACCESS

2025 Volume E108.D Issue 6 Pages 570-582

Details
Abstract

Data imbalance frequently occurs across multiple sectors, including healthcare, security, and finance. It substantially increases the difficulty of classification. To tackle the issue of existing techniques for binary imbalanced data classification easily changing data distribution and to enhance the performance of the classifier, this paper introduces a boosting method named GAPBoost based on center-of-gravity oversampling and pruning for classifying imbalanced data in binary classification scenarios. The algorithm first clusters all instances of the minority class into k distinct clusters by utilizing K-means clustering. Then, it performs center-of-gravity oversampling on the clusters with enough instances to constitute a triangle and generates new instances using the interpolation method on the clusters containing only two minority class instances. Subsequently, pruning is employed to eliminate noisy data from both the majority and minority classes, followed by the AdaBoost algorithm to improve the performance of the classifier on the denoised training set. Ten-fold stratified cross-validation experiments of the GAPBoost algorithm and several other classic ensemble algorithms are performed on 20 benchmark unbalanced datasets using AUC, F1, and G-mean as performance evaluation criteria. The results of the experiments indicate that the GAPBoost algorithm introduced in this paper can effectively handle the classification problem of binary imbalanced datasets and outperform other ensemble algorithms in three evaluation metrics.

Content from these authors
© 2025 The Institute of Electronics, Information and Communication Engineers
Previous article Next article
feedback
Top