Proceedings of the Symposium on Chemoinformatics
32th Symposium on Chemical Information and Computer Sciences, Yamaguchi
Conference information

Oral Session
Relationships between applicability domain and accuracy of prediction of regression models
*Masahiro KanekoMasamoto ArakawaKimito Funatsu
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Pages O1

Details
Abstract
Multivariate regression methods such as a partial least squares (PLS) method and support vector regression (SVR) are powerful tools for handling several problems in chemoinformatics. Attempts to construct models having high predictive accuracy have been made by using those methods, and then significant results have been produced. On the other hand, because predictability of constructed models differ among query samples predicted with these models, it is important to estimate prediction errors of these samples. Therefore, in this study, we tried to quantify relationships of applicability domains (AD) of regression models and prediction errors. The larger distances to models (DM) are, the lower the accuracy of prediction would be estimated. We used Euclidean distances to an average of training data and ones to the nearest sample in training data as DM, and PLS and SVR as methods constructing regression model. The proposed method were applied to quantitative structural-property relationships (QSPR) and soft sensor analyses. Estimate accuracy of prediction errors increased in QSPR analysis. In soft sensor analysis, higher fault detection ability was achieved than a traditional method.
Content from these authors
© 2009 The Chemical Society of Japan
Next article
feedback
Top