Journal of Computer Chemistry, Japan
Online ISSN : 1347-3824
Print ISSN : 1347-1767
ISSN-L : 1347-1767

This article has now been updated. Please use the final version.

Development of Machine Learning Models withFragment Molecular Orbital Calculation Data
Koichiro KATOHiromu MATSUMOTORyosuke KITA
Author information
JOURNAL FREE ACCESS FULL-TEXT HTML Advance online publication

Article ID: 2024-0015

Details
Abstract

フラグメント分子軌道(FMO)法はタンパク質全体を量子化学計算可能な稀有な手法である.そして,FMO法によって得られるデータもまた,現状ではタンパク質系の量子化学計算データとして唯一無二のものとなっている.汎用ソフトウェアでは生成が困難なタンパク質の量子化学計算データとそれを用いた様々な機械学習モデルの開発は,近年活性化が著しいAI創薬に大きなインパクトを与えることが期待される.本稿では,筆者らのグループで進めているFMOデータを用いた機械学習モデル(原子電荷予測モデル,相互作用予測モデル,機械学習力場)の開発状況を概説する.

Translated Abstract

Abstract: Fragment Molecular Orbital (FMO) is a unique method that allows quantum mechanical (QM) calculations of entire proteins. The data obtained by the FMO method are also currently the only QM calculation data for protein systems. The development of various machine learning models using the QM calculation data of proteins, which are difficult to generate with general-purpose software, is expected to have a significant impact on AI drug discovery, which has been remarkably active in recent years. This paper outlines the status of the development of machine learning models using FMO data, which is ongoing in the author's group.

Figures
Figure 1

 Comparison of FMO calculated and NN predicted values for RESP charges for the structure with the highest coefficient of determination (R2) in the Test data.

Figure 2

 Comparison of FMO calculated and NN predicted values of IFIE-Sum for 20 poses of each of 15 Test ligands of CDK2 (total 300 data).

Figure 3

 Benchmark of machine learning forcefield training scripts for water-only system.

Tables
Table 1 Details of the dataset used to build the atomic charge prediction model. Note that the numbers in the table do not include water molecules.

polyQ10 TrpCage BRD2
PDBID 2OTU 1L2Y 5IBN
Residues (Atoms) 10(173) 20(304) 111(1,811)
MD snapshots 10,000 10,000 1,000
FMO data 10,000 9,805 902
Train 7,997 7,840 718
Validation 2,000 1,961 180
Test 3+1,000 4+989 4+180
Table 2 The coefficient of determination (R2) and the net charge for test data at the cluster center for each target are presented. The net charge represents the sum of the atomic charges of all atoms. The correct values for polyQ10, TrpCage, and BRD2 are 0, 1, and 1, respectively.

polyQ10 TrpCage BRD2
Test R2 Net charge R2 Net charge R2 Net charge
1 0.982 -0.31 0.928 1.50 0.896 1.35
2 0.960 -0.23 0.926 0.97 0.888 3.63
3 0.955 0.46 0.917 0.51 0.886 -0.48
4 0.900 1.38 0.873 4.60
参考文献
 
© 2024 Society of Computer Chemistry, Japan
feedback
Top