Proceedings of the Symposium on Chemoinformatics
42th Symposium on Chemoinformatics, Tokyo
Conference information

Oral Session (B)
On the Number of Sample Data Required in Machine Learning in Chemistry: A Computer Simulation Study using the Marcus Theory for Electron Transfer
*Kazuki YoshidaManabu Sugimoto
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Pages 1B03-

Details
Abstract

Applications of machine learning methods to chemistry and materials science have been attracting much attention in recent studies. In these fields, only limited number of experimental data are usually available for supervised machine learning (SML). Herein, we performed a model study on the accuracy of SML in obtaining regression models for electron-transfer rate using a small data set of reference data. The model data was prepared by applying the Marcus theory on electron transfer. Three parameters that reflect the characteristics of the reaction substrate in the formula were generated using random numbers, and 1000 pieces of training and test data sets were created. Arbitrary numbers were chosen from the training set, 0-30% error was added, and the performance was compared by performing prediction using a support vector machine (SVR). As a result, when there was no error in the training data, at least 30 pieces of data were required for the R2 value of the test set to be 0.8 or more.

Content from these authors
Previous article Next article
feedback
Top