2013 年 64 巻 2E 号 p. 325-335
This paper presents a new approach which uses both the k-nearest neighbor (k-NN) algorithm and random forest method to deal with imbalanced data sets in a small-business credit assessment. Two types of classifiers are designed. The first one is called a preliminary classifier, which is constructed using a k-means clustering algorithm based on the test data in order to save useful information of the customers of the majority class as much as possible. The second classifier is constructed using the random forest method; it is used to reclassify customers that were predicted to belong to the non-majority class in the preliminary classification to improve the classification performance of the minority class. The proposed approach has been applied to the credit assessment problem in a small company and compared to methods based on only the k-nearest neighbor algorithm or only the random forest. It has shown that the proposed approach has higher ability to identify the insolvent customers of the minority class.