Abstract
Estimation of synthetic accessibility is an important aspect for computer-aided drug design. Several methods to predict synthetic accessibility are reported. These methods are based on retrosynthetic analysis, molecular complexity, and fragment contributions. However, there is almost no method using machine learning. Here we report a prediction method of synthetic accessibility using machine learning. Since synthetic accessibility is a subjective judgment, it is difficult to prepare a large-scale training set for machine learning. Here, we assume that compounds obtained by removing the ZINC15 compounds (purchasable “drug-like” compounds) from the GDB-17 compounds (Compounds of up to 17 atoms of C, N, O, S, and halogens forming the chemical universe database) are likely to be difficult to synthesize, and ZINC15 compounds are easier to synthesize than these compounds. Based on the hypothesis, we have created a data set and applied it on the neural network classifier. Then, we have evaluated the model using a validation set obtained from the literature. The results show that the model was possible to distinguish compounds that are difficult to synthesize from easier ones. We are developing models using different machine learning methods and expect to report a comparison with the neural network model.