Chem-Bio Informatics Journal
Online ISSN : 1347-0442
Print ISSN : 1347-6297
ISSN-L : 1347-0442
calculation report
Visualization of Substructure Contributing to Metabolic Stability using Morgan Fingerprint
Tsuyoshi EsakiHirofumi WatanabeYugo ShimizuReiko WatanabeYuki DoiSeisuke TakimotoKazuyoshi Ikeda
Author information
JOURNAL OPEN ACCESS FULL-TEXT HTML
Supplementary material

2025 Volume 25 Pages 28-35

Details
Abstract

The cost and time required for drug discovery have increased, prompting the need for more efficient methods to predict compound properties using artificial intelligence (AI). In particular, the prediction of absorption, distribution, metabolism, and excretion (ADME) properties is crucial. Intrinsic metabolic clearance (CLint) is an essential property in ADME because it affects the side effects or the dosage schedule. In silico machine learning (ML) models for estimating CLint have been developed, but due to the inherent complexity of AI techniques, it is difficult to obtain ideas for a more effective structure. Therefore, we employed SHAP (SHapley Additive exPlanations), an explainable AI (XAI) method, to elucidate the contributions of molecular substructures to CLint predictions. We constructed a random forest model, classified into low and high clearance categories. SHAP values were calculated to visualize the importance of features, and significant substructures influencing CLint were identified. The model demonstrated a high recall for predicting low-CLint compounds; however, its overall accuracy for high-CLint classification was low. This highlights its potential for filtering out low-CLint compounds rather than accurately identifying those with high-CLint. Visualization of SHAP values provided insights into substructure modifications to improve CLint, thus providing valuable guidance for drug optimization. This approach highlights the effectiveness of integrating explainable AI methods to improve the interpretability of ML models in drug discovery.

1. Introduction

The costs and time required for drug discovery have increased, which has encouraged the use of artificial intelligence (AI), including machine learning techniques, for effective drug discovery. An important use of AI is the prediction of the properties of compounds using features of the compound structure. Predicting properties such as absorption, distribution, metabolism, and excretion (ADME) is necessary for drug discovery because ADME directly affects the estimation of the administered dosage and determination of the dosage schedule. Therefore, machine learning (ML) models are required to predict ADME properties.

Intrinsic metabolic clearance was measured using liver microsomes (CLint), which is an in vitro ADME property. CLint is an integrated characteristic of various metabolic enzymes. As they lengthen the time drugs stay in the body and extend the duration of their effect, low-CLint compounds are required for the development of drug-like compounds generally. However, whether compounds need low-CLint depends on the therapeutic goal, not a universal rule for drug development. For chronic conditions with long-lasting effects and long half-lives, low-CLint compounds are required. However, in some cases, for example, drugs that are designed to act for a short period (e.g., anesthetics or certain acute treatments) can benefit from high-CLint to ensure rapid clearance, and drugs with high-CLint are acceptable or even beneficial. There are instances where a high-CLint can be acceptable or even desirable, such as in the case of anesthetics. However, it is also true that metabolism can lead to the emergence of side effects, and for drugs where prolonged efficacy is expected, the case in which low-CLint is required is major. Therefore, to estimate the metabolic characteristics of seed compounds, researchers have constructed AI models to classify the compounds in terms of the CLint degree [1, 2].

However, understanding the reasons for these predictive results is difficult because AI models are generally complex. Understanding the reasons for the prediction is required for the improvement of metabolic stability. Therefore, methods have been developed to explain the results predicted by AI (explainable AI or XAI) [3].

The predicted results Shapley additive explanation (SHAP) [4], a commonly used XAI method, is an effective application. In this study, SHAP was used to evaluate and visualize the substructures that contribute to CLint predictions.

2. Materials and Methods

A predictive model for CLint was constructed to evaluate the effectiveness of SHAP. The collected dataset was separated into training and test sets using a clustering algorithm in Pipeline Pilot (Dassault Biovia [2017]). To predict the CLint of the compounds, the descriptors of the compounds were calculated, and a machine learning model was constructed in a Python environment (ver. 3.9). For model construction and compound visualization, Morgan fingerprint was performed using RDKit (ver. 2024.5.1) [5]. Demonstration of SHAP and its visualization were performed in Python. The details are as follows.

2.1 Dataset

The experimental data on CLint were obtained from a public database DruMAP [6]. Duplicate compounds were removed, and 4218 compounds were retained as datasets for this study. After discussing with medicinal chemistry researchers, the accuracy of the model must be above a certain level in order to discuss explainability because the explainability of a low-precision model is meaningless. Furthermore, since the practically useful threshold for drug discovery varies from phase to phase, the threshold was determined in consideration of the balance between the performance of the machine learning model made by the data set and its pharmacological implications. Therefore, the CLint class was divided into two classes. (Low (CLint ≦ 80 (μL/min/mg)): 3018 compounds, High (80 < CLint (μL/min/mg)): 1200 compounds)

The compounds in the dataset were clustered to separate the training and test sets. Random sampling is typically used to separate the data. However, we used a clustering algorithm to extract test compounds for which similar compounds were not included in the training set. To cluster the compounds, the clustering-molecule component was used in Pipeline Pilot under the following conditions: descriptor FCFP6 (functional connectivity fingerprint for all subgraphs with a diameter of up to size 6) [7], the average number of clusters was 10, MaxDistance was 0.65, and the metric between compounds was calculated by Euclidean distance. Following clustering, the compounds were separated into 27 classes. Cluster number 4 (39 compounds, including low 24 and high 15 compounds) was selected as the test set, and the compounds in the other clusters were defined as training sets. (4179 compounds, including low 2994 compounds and high 1185 compounds)

Table 1. Summary of datasets. Average and standard deviation (SD) of molecular weight and logP were calculated using RDKit

Dataset

Number of compounds

(Low / High)

Molecular weight

Average (SD)

LogP

Average (SD)

Training

4179

(2994 / 1185)

424.487

(91.434)

3.650

(1.188)

Test

(cluster number 4)

39

(24 / 15)

470.292

(29.977)

4.673

(0.873)

2.2 Morgan fingerprint calculation and model construction

Morgan fingerprints with radial=2 and 4096 bits were used in RDKit for descriptor calculation. To demonstrate SHAP the decision tree algorithm uses the shap (ver. 0.45.0) module in Python [8]. Among the tree algorithms, random forest (RF) [9] is a high-performance and quick model construction. Therefore, the scikit-learn module [10] was used to construct the RF model.

RF is an ensemble model based on a decision tree algorithm that requires three parameters: n_estimators, the number of trees; max_depth, the maximum depth; and min_samples_split, the minimum number of samples required to split an internal node. The most proper parameters were evaluated using GridsearchCV (parameter: cv=5, score="accuracy"). Consequently, the selected parameters were determined as follows (Table 2). Based on these metrics, this model demonstrated a high recall for predicting low-CLint compounds but had difficulty accurately predicting high-CLint compounds. The aim of this study is to examine the explanatory potential of XAI, and it is not assumed that the model itself will be used in the field as is. Therefore, although a high prediction accuracy is desired, the model was deemed acceptable because of its ability to correctly infer low-CLint compounds in practice.

Table 2. Summary of Random Forest training

Parameter Value (Values used for parameter tuning)
n_estimators 500 (300, 400, 500, 600, 700)
max_depth 17 (14, 15, 16, 17, 18)
min_sample_split 8 (5, 6, 7, 8, 9)
Performance Score
Best mean accuracy 0.734
Standard deviation for CV 0.019
Sensitivity (high-CLint) 0.917
Sensitivity (low-CLint) 0.067
Specificity (high-CLint) 0.067
Specificity (low-CLint) 0.917
Matthews correlation coefficient -0.030
Area of under the curve 0.492
F-1 score 0.733

2.3 Calculation of SHAP

SHAP calculates the importance value of each feature for prediction based on game theory. This method is based on an additive model and the feature contribution is quantitatively calculated. The function, TreeExplainer [11] in the shap module of Python was used to calculate the SHAP values of the test compounds. Waterfall plots and the contribution of each sample were visualized.

3. Results and Discussion

3.1 Predictive Accuracy

The optimized model was used to predict the CLint classes of the test compounds, and the results are listed in Table 3. Using the confusion matrix, the accuracy of this model was 0.590, which was not a high score. This model tends to estimate compounds to be “low”-CLint. The training set seemed to have a few similar compounds to the test set because the dataset was separated by structure rather than randomly. However, the model had a high recall (0.917) for predicting low-CLint compounds, meaning it effectively selected low-CLint candidates.

Table 3. Confusion matrix for CLint classification

Predicted Class
Low High
Actual Class Low 22 2
High 14 1

However, its accuracy for high-CLint classification was low, indicating that it struggled to correctly identify high-CLint compounds. The aim of this study is to examine the explanatory potential of XAI, and it is not assumed that the model itself will be used in the field as is. Therefore, although a high prediction accuracy is desired, the model was deemed acceptable because of its ability to correctly infer low compounds in practice.

3.2 SHAP value for all test compounds

The SHAP values for all test compounds were calculated and plotted as a summary plot (Figure 1). The high SHAP value in this model increased the probability of categorizing compounds as high-CLint. Fingerprint 3247 is associated with an increased probability of high-CLint, as indicated by its SHAP values. Similarly, the absence of fingerprints 3291 and 3802 is associated with a decreased probability of classification into the high-CLint category. These results suggest that the presence or absence of specific fingerprints can influence the model’s prediction of high-CLint compounds. Having many substructures (fingerprint=1) results in increased sites for metabolic enzymes to metabolize; therefore, a high feature value (fingerprint=1) tends to push up the SHAP value to categorize the compounds into the high-CLint class.

Figure 1. Contributions of descriptors to high-CLint classification

The x- and y-axes show the SHAP values and fingerprints, respectively. A high SHAP value (right side of the x-axis) tended to distinguish the compound with a high-CLint. The color bar shows the volume of the fingerprint; pink and blue indicate fingerprints = 1 and 0, respectively.

3.3 SHAP value for a target test compound and visualization of essential substructure

Low-CLint compounds are required for the development of drug-like compounds generally. Therefore, to examine the features that contribute to the output of the provability of distinguishment into high-CLint, the feature contribution of compound name 32759 (both observed and predicted classes were high) in the test set was represented as a SHAP value in Figure 2. In this figure, fingerprint 3247 is the feature that most strongly indicates the probability of a high-CLint class moving toward the output value. Fingerprints 3291 and 1066 push the probability to the high-CLint category.

Figure 2. Contributions of descriptors to high-CLint classification

The force plot shows the contribution of each fingerprint in distinguishing compound 32759 from the high-CLint class. Fingerprint 3247 significantly increased the output as follows: 3291 and 1066. The bold value, which indicates the probability that this compound is distinguished into a high class, implies that the model predicts this compound with low confidence.

The substructures estimated to make a significant contribution are shown in the structure. (Figure 3) The substructure of fingerprint 3247 is a part of the substructure consisting of atoms bonded to the central carbon atoms belonging to the benzene ring, within a radius of 2. This substructure pushes the output value to a high class; therefore, the structure is colored according to the SHAP value. SHAP value was converted using the following formula to determine the color. The substructure, which increases the probability of being a “high” class, was colored pink based on the converted SHAP color value (here, s represents SHAP values and max_shap means the max value of the shap score.)

converted SHAP color value =(1-sqrt(|s|))/(max_shap)

The substructures of high importance for estimating CLint were clarified by showing their structures. Furthermore, the essential substructures were revealed by varying the color intensity based on the SHAP values. The substructure of fingerprint 3247 indicates that it is likely to accelerate metabolism. When we checked compounds with fingerprint 3247 in the training data, many represented parts of the benzene rings in the training compounds. (Figure S1 in Supporting Information) Fingerprint 3247 represents a substructure of a benzene ring, so it is generally easily metabolized, and it is reasonable that fingerprint 3247 is found in compounds with high-CLint while it is also found in compounds with low-CLint, but there are many hydrophilic parts in the parts other than fingerprint 3247. Substructure 3247 represents a part of the benzene ring that could be added.

Next, we examined the compound with 1387 in the training data and found that it corresponded to multiple substructures. (Figure S2 in Supporting Information) Although fingerprints represent a variety of substructures including amines, quaternary carbons, and pyridines, it is difficult to infer the degree of metabolism. Each bit of Morgan fingerprints contains multiple substructures because they convolute bits of substructural information. Therefore, it is difficult to find a one-to-one substructure that improves CLint when the given fingerprint is 1. Therefore, to improve CLint, it is necessary to carefully check strongly lighted substructures. This model may facilitate the optimization of the compounds.

Figure 3. The substructures are estimated to contribute to distinguishing compound 32759 from high-CLint

The compounds shown in (a) and (b) are the same; compound ID is 32759 in the test set. Measured CLint is high (140 μL/min/mg), and predicted was collectly high (>80 μL/min/mg). The compounds in (a) are colored in substructures with 3247 bits in the Morgan fingerprint (a). The SHAP value was 0.122, and the substructure was highlighted with a vivid pink using the converted SHAP color value. The compound in (b) is shown as a substructure of fingerprint 1387. The substructure was weakly pink because the converted SHAP value was 0.01356. Based on this vividness, the contribution of fingerprint 3247 was higher than that of fingerprint 1387.

The compounds shown in (a) and (b) are the same; compound ID is 32759 in the test set. Measured CLint is high (140 μL/min/mg), and predicted was collectly high (>80 μL/min/mg). The compounds in (a) are colored in substructures with 3247 bits in the Morgan fingerprint (a). The SHAP value was 0.122, and the substructure was highlighted with a vivid pink using the converted SHAP color value. The compound in (b) is shown as a substructure of fingerprint 1387. The substructure was weakly pink because the converted SHAP value was 0.01356. Based on this vividness, the contribution of fingerprint 3247 was higher than that of fingerprint 1387.

4. Conclusion

The development of various ML packages has enabled the construction of in silico models for ADME predictions. However, many ML models have difficulty explaining the reasons for this output. Although essential features and contribution values can be estimated, it is difficult to determine the important substructures in a target compound. This makes it difficult to develop new drug seeds with improved ADME properties.

It is essential to demonstrate the contributions and highlight the critical substructures of these compounds. We visualized the SHAP values calculated using the random forest predictor. Substructures highlighted by SHAP, as well as strongly highlighted portions, may be helpful for structural optimization. However, the model’s ability to accurately classify high-CLint compounds is limited, indicating the need for further refinement in its predictive accuracy. This information should provide hints for further discussions on new compounds. It is anticipated that this report will help identify effective seeds and accelerate drug discovery.

Supplemental information

The datasets used in this study were as follows:

  •    CBIJ_Esaki_CalRep_SHAP_CLint_SI.txt: Text file of the data set

Acknowledgments

This work was partially supported by a grant for research on priority areas from Shiga University (2024) and JSPS KAKENHI (Grant Number 23K14382). We thank Editage (www.editage.jp) for English language editing.

References
  • [1]  Esaki, T.; Watanabe, R.; Kawashima, H.; Ohashi, R.; Natsume-Kitatani, Y.; et al. Data Curation can Improve the Prediction Accuracy of Metabolic Intrinsic Clearance. Mol. Inform. 2019, 38, e1800086. DOI:10.1002/minf.201800086
  • [2]  Liu, R.; Schyman, P.; Wallqvist, A. Critically Assessing the Predictive Power of QSAR Models for Human Liver Microsomal Stability. Inf. Model. 2015, 55, 1566–1575. DOI: 10.1021/acs.jcim.5b00255
  • [3]  Wellawatte, G.P.; Gandhi, H.A.; Seshadri, A.; White, A.D. A Perspective on Explanations of Molecular Prediction Models. J. Chem. Theory Comput. 2023, 19, 2149–2160. DOI: 10.1021/acs.jctc.2c01235
  • [4]  Lundberg, S.M.; Lee, S.-I. A Unified Approach for Interpreting Model Predictions. Advances in NIPS, arXiv, 1705.07874, DOI: 10.48550/arXiv.1705.07874
  • [5]  RDKit: Open-source cheminformatics software.
  • [6]  Kawashima, H.; Watanabe, R.; Esaki, T.; Kuroda, M.; Nagao, C.; et al. DruMAP: A Novel Drug Metabolism and Pharmacokinetics Analysis Platform. J. Med. Chem. 2023, 66, 9697–9709. DOI: 10.1021/acs.jmedchem.3c00481
  • [7]  Rogers, D.; Hahn, M. Extended-Connectivity Fingerprints. J. Chem. Inf. Model 2010, 50, 742–754. DOI: 10.1021/ci100050t
  • [8]  Lundberg, S. SHAP Python package, https://github.com/slundberg/shap, 2021
  • [9]  Breiman L. Random Forests. Mach. Learn. 2001, 45, 5–32. DOI: 10.1023/A:1010933404324
  • [10]  Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B. et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. DOI: 10.48550/arXiv.1201.0490
  • [11]  Lundberg, S.M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J.M.; et al. From Local Explanations to Global Understanding with Explainable AI for Trees. Nat. Mach. Intell. 2020, 2, 56–67. DOI: 10.1038/s42256-019-0138-9
 
International (CC BY 4.0) : The images, videos or other third party material in this article are also included in the article’s Creative Commons license.To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

この記事はクリエイティブ・コモンズ [表示 4.0 国際]ライセンスの下に提供されています。
https://creativecommons.org/licenses/by/4.0/deed.ja
feedback
Top