Rough set theoryを用いた変数選択手法の開発

光山 倫央; 荒川 正幹; 船津 公人

doi:10.11545/ciqs.2007.0.JP23.0

Abstract

In QSAR studies, it is known that a model using variables irrelevant to object variables often have a low predictive ability. So in order to build a model that gives high predictive ability, it is important to select minimal number of the variables necessary for prediction. Many variable selection methods have been developed, and we propose the new method using rough set theory (RST). RST is the theory that is used to classify inaccurate or incomplete data sets. By applying this theory to variable selection, we can find minimal subsets of variables which give the same partition as the whole set of variables (reduction set). In this study, this method is applied to QSAR analyses of inhibitors of DHFR and HIV-1. We built the PLS models both using reduction sets and using variables randomly selected. PLS models made by reduction sets give predictive power not less than predictive power which the models made by randomly selected variables give. Furthermore, it became cleared that chemically meaningful variables are tend to be selected by RST, and therefore, shown that RST is superior to the random selection. After this, we will continue to examine the advantage of RST by comparing with other methods.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!