Article ID: 2024-0015
フラグメント分子軌道(FMO)法はタンパク質全体を量子化学計算可能な稀有な手法である.そして,FMO法によって得られるデータもまた,現状ではタンパク質系の量子化学計算データとして唯一無二のものとなっている.汎用ソフトウェアでは生成が困難なタンパク質の量子化学計算データとそれを用いた様々な機械学習モデルの開発は,近年活性化が著しいAI創薬に大きなインパクトを与えることが期待される.本稿では,筆者らのグループで進めているFMOデータを用いた機械学習モデル(原子電荷予測モデル,相互作用予測モデル,機械学習力場)の開発状況を概説する.
Abstract: Fragment Molecular Orbital (FMO) is a unique method that allows quantum mechanical (QM) calculations of entire proteins. The data obtained by the FMO method are also currently the only QM calculation data for protein systems. The development of various machine learning models using the QM calculation data of proteins, which are difficult to generate with general-purpose software, is expected to have a significant impact on AI drug discovery, which has been remarkably active in recent years. This paper outlines the status of the development of machine learning models using FMO data, which is ongoing in the author's group.
Comparison of FMO calculated and NN predicted values for RESP charges for the structure with the highest coefficient of determination (R2) in the Test data.
Comparison of FMO calculated and NN predicted values of IFIE-Sum for 20 poses of each of 15 Test ligands of CDK2 (total 300 data).
Benchmark of machine learning forcefield training scripts for water-only system.
polyQ10 | TrpCage | BRD2 | |
PDBID | 2OTU | 1L2Y | 5IBN |
Residues (Atoms) | 10(173) | 20(304) | 111(1,811) |
MD snapshots | 10,000 | 10,000 | 1,000 |
FMO data | 10,000 | 9,805 | 902 |
Train | 7,997 | 7,840 | 718 |
Validation | 2,000 | 1,961 | 180 |
Test | 3+1,000 | 4+989 | 4+180 |
polyQ10 | TrpCage | BRD2 | ||||
Test | R2 | Net charge | R2 | Net charge | R2 | Net charge |
1 | 0.982 | -0.31 | 0.928 | 1.50 | 0.896 | 1.35 |
2 | 0.960 | -0.23 | 0.926 | 0.97 | 0.888 | 3.63 |
3 | 0.955 | 0.46 | 0.917 | 0.51 | 0.886 | -0.48 |
4 | 0.900 | 1.38 | 0.873 | 4.60 |