Total Quality Science
Online ISSN : 2189-3195
ISSN-L : 2189-3195
Imbalanced data classification procedure based on SMOTE
Daiki GyotenMasato OhkuboYasushi Nagata
Author information
JOURNAL FREE ACCESS

2020 Volume 5 Issue 2 Pages 64-71

Details
Abstract

Now that it is possible to handle the enormous amounts of data available on the Internet and generated by corporate systems, automatic classification technology has assumed greater importance. Various learning methods have been proposed for solving two-class classification problem. Furthermore, an ensemble learning method that uses a combination of a plurality of classifiers can deliver high accuracy. However, the imbalance of class labels, that is, the sample size between classes to be classified is greatly different, often occurs. It is reported that the ensemble learning method has poor accuracy in regard to such imbalanced data.
Therefore, in this research, we propose a novel analysis procedure based on ensemble learning for such imbalanced data. Our proposed procedure involves dividing data into several legions by clustering and uses the over-sampling technique to make data ease imbalanced state and learns classification rule based on the random forest method proposed by Breiman (2001). Through simulations, we show the effectiveness of the proposed procedure.

Content from these authors
© 2020 The Japanese Society for Quality Control
Previous article Next article
feedback
Top