Rank-based Feature Selection of Diabetes Classification Problem with Positive Correlation

Md. Monirul Kabir; Md. Shahjahan; Kazuyuki Murase

doi:10.14864/softscis.2006.0.445.0

Abstract

The feature selection is to select a set of attributes that is relevant for a given task. Due to the presence of irrelevant and redundant attributes, higher predictive accuracy can be expected by selecting only the relevant attributes from the dataset. Conventional feature selection was done after discretization. In this case, however, network size and computation would be increased drastically. In this paper, we introduced a rank-based feature selection (RFS) method using positive correlation learning (PCL). A higher-ranking attribute is the one that has a lower distance from a reference point. We consider an attribute to be relevant if its rank is higher. We empirically proved that if irrelevant features are removed, higher classification accuracy can be achieved. This is because PCL tries to create smaller coherent weights. We applied this approach to diabetes problem. We show empirically that RFS can easily remove the irrelevant features and produce better accuracy and generalization.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!