Data Science Center, and Graduate School of Science and Technology, Nara Institute of Science and Technology
Kimito Funatsu
Data Science Center, and Graduate School of Science and Technology, Nara Institute of Science and Technology Department of Chemical System Engineering, School of Engineering, The University of Tokyo
Iterative screening surveys a small set of (virtual) compounds, during which their property values are determined by experiments and used as feedback for updating quantitative structure-property (activity) relationship models. This cycle is repeated several times until identifying the compounds exhibiting desired property values or better property (activity) values. In the present work, we have conducted a series of virtual experiments to assess the characteristics of different iterative screening methods using compounds from ZINC and ChEMBL databases. Overall, batch-based Bayesian optimization with Gaussian process, which impose penalty on the acquisition function for compounds proximal to already sampled compounds in a batch, performed better in terms of the number of iterations to identify one of the goal compounds. Linear regression models without taking into account the domain of applicability to the regression model also worked consistently for the property for which a key factor was present in the set of molecular descriptors.