Investigation of methods for constructing prediction models for the small dataset of Cytochrome P450 (CYPs) inhibition using deep learning

Elpri E. PERMADI; Reiko WATANABE; Kenji MIZUGUCHI

doi:10.14869/toxpt.51.1.0_O-19

抄録

The cytochrome P450 (CYP) superfamily metabolizes diverse compounds, and drug-induced CYP inhibition can lead to adverse drug-drug interactions. Therefore, identifying potential CYP inhibitors is crucial for safe drug administration. This study explored multitask deep learning with graph convolutional networks (GCN) to predict CYP inhibition, addressing limited data challenges. Public databases provided data on 12,654 compounds for seven CYP isoforms, including two small datasets for CYP 2B6 and 2C8 (481 and 724 compounds, respectively). A baseline model to classify compounds as inhibitors or non-inihibitors was built with kMoL, but limitations in dataset size and imbalance challenged the prediction performance for 2B6 and 2C8. Thus, multitask and fine-tuning models were implemented to improve predictions for 2B6 and 2C8. While they produced modest improvements, the differences were not statistically significant. Additionally, missing data exceeding 50% negatively affected the multitask model performance. Imputing missing data using predictions from both single-task and multitask models led to significant improvements (F1 and Kappa values) for the limited datasets. Notably, a multitask model combined with imputation from the multitask model outperformed all the other approaches. This study demonstrated that multitask deep learning, particularly with the imputation of missing values, can effectively improve the prediction performance of models on small datasets.

著者関連情報

お気に入り & アラート

閲覧履歴

前身誌

日本トキシコロジー学会学術年会

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）