Abstract
In experimental design for functional molecules, materials or products, complicated relationships exist between various experimental parameters and objective physical and chemical properties. Regression analysis with experimental data are a useful way for understanding those relationships. A constructed regression model can be used to effectively search for functional products. However, although those products can be found in domains out of existing data, the predictive ability of the model tends to be low in regions where data density is low, and new candidates whose predicted values of a property are unreliable will not achieve desired values of the property. Therefore to search for new candidates in appropriate extrapolation domains, we consider the probability that a new candidate will have intended values of a property and the reliability of a predicted value of the property for the candidate. The probability is calculated from a predicted value and its prediction error estimated by using the gaussian process model, and the reliability is based on data density calculated with the one-class support vector machine (OCSVM) model. The proposed method is applied to simulation data and aqueous solubility data, and the efficiency of the method could be confirmed.