Feature selection refers to a critical preprocessing of machine learning to remove irrelevant and redundant data. According to feature selection methods, sufficient samples are usually required to select a reliable feature subset, especially considering the presence of outliers. However, sufficient samples cannot always be ensured in several real-world applications (e.g. neuroimaging, bioinformatics, psychology, as well as sport sciences). In this study, a method to improve the performance of feature selection methods with low-sample-size data was proposed, which is named Feature Selection Based on Data Quality and Variable Training Samples (QVT). Given that none of the considered feature selection methods perform optimally in all scenarios, QVT is primarily characterized by its versatility, because it can be implemented in any feature selection method. An experiment was performed using 20 benchmark datasets, three feature selection methods and three classifiers to verify the feasibility of QVT; the results suggested that QVT was applicable to different feature selection methods and significantly improved predictive performance of different classifiers.
抄録全体を表示