Proceedings of the Symposium on Chemoinformatics
38th Symposium on Chemoinformatics, Tokyo
Conference information

Oral Session
Is overfitting really a problem?
*Hiromasa KanekoKimito Funatsu
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Pages 28-31

Details
Abstract
Accuracy and applicability domains (ADs) of regression models are discussed in our presentation. Generally, we construct a regression model so as to prevent overfitting to training data and to have highly predictive performance for diverse compounds. However, an overfitted model must have highly predictive ability only within an AD, which is narrowly limited. In this study, the aqueous solubility data set was analyzed to compare performance of regression models while considering their ADs. Support vector regression (SVR) was used as a regression analysis method and hyperparameters of SVR changed. The ADs were set based on data density. There existed two types of SVR models. One is well-constructed SVR models that could predict solubility values for diverse compounds. The other is overfitted SVR models that seemed to have bad predictive ability but provided better prediction results for compounds within the ADs than the other type of SVR models. It was confirmed that overfitting itself was not a problem and we could operate overfitted models by setting their ADs appropriately.
Content from these authors
Previous article Next article
feedback
Top