2025 Volume 25 Pages 28-35
The cost and time required for drug discovery have increased, prompting the need for more efficient methods to predict compound properties using artificial intelligence (AI). In particular, the prediction of absorption, distribution, metabolism, and excretion (ADME) properties is crucial. Intrinsic metabolic clearance (CLint) is an essential property in ADME because it affects the side effects or the dosage schedule. In silico machine learning (ML) models for estimating CLint have been developed, but due to the inherent complexity of AI techniques, it is difficult to obtain ideas for a more effective structure. Therefore, we employed SHAP (SHapley Additive exPlanations), an explainable AI (XAI) method, to elucidate the contributions of molecular substructures to CLint predictions. We constructed a random forest model, classified into low and high clearance categories. SHAP values were calculated to visualize the importance of features, and significant substructures influencing CLint were identified. The model demonstrated a high recall for predicting low-CLint compounds; however, its overall accuracy for high-CLint classification was low. This highlights its potential for filtering out low-CLint compounds rather than accurately identifying those with high-CLint. Visualization of SHAP values provided insights into substructure modifications to improve CLint, thus providing valuable guidance for drug optimization. This approach highlights the effectiveness of integrating explainable AI methods to improve the interpretability of ML models in drug discovery.
1. Introduction
The costs and time required for drug discovery have increased, which has encouraged the use of artificial intelligence (AI), including machine learning techniques, for effective drug discovery. An important use of AI is the prediction of the properties of compounds using features of the compound structure. Predicting properties such as absorption, distribution, metabolism, and excretion (ADME) is necessary for drug discovery because ADME directly affects the estimation of the administered dosage and determination of the dosage schedule. Therefore, machine learning (ML) models are required to predict ADME properties.
Intrinsic metabolic clearance was measured using liver microsomes (CLint), which is an in vitro ADME property. CLint is an integrated characteristic of various metabolic enzymes. As they lengthen the time drugs stay in the body and extend the duration of their effect, low-CLint compounds are required for the development of drug-like compounds generally. However, whether compounds need low-CLint depends on the therapeutic goal, not a universal rule for drug development. For chronic conditions with long-lasting effects and long half-lives, low-CLint compounds are required. However, in some cases, for example, drugs that are designed to act for a short period (e.g., anesthetics or certain acute treatments) can benefit from high-CLint to ensure rapid clearance, and drugs with high-CLint are acceptable or even beneficial. There are instances where a high-CLint can be acceptable or even desirable, such as in the case of anesthetics. However, it is also true that metabolism can lead to the emergence of side effects, and for drugs where prolonged efficacy is expected, the case in which low-CLint is required is major. Therefore, to estimate the metabolic characteristics of seed compounds, researchers have constructed AI models to classify the compounds in terms of the CLint degree [1, 2].
However, understanding the reasons for these predictive results is difficult because AI models are generally complex. Understanding the reasons for the prediction is required for the improvement of metabolic stability. Therefore, methods have been developed to explain the results predicted by AI (explainable AI or XAI) [3].
The predicted results Shapley additive explanation (SHAP) [4], a commonly used XAI method, is an effective application. In this study, SHAP was used to evaluate and visualize the substructures that contribute to CLint predictions.
2. Materials and Methods
A predictive model for CLint was constructed to evaluate the effectiveness of SHAP. The collected dataset was separated into training and test sets using a clustering algorithm in Pipeline Pilot (Dassault Biovia [2017]). To predict the CLint of the compounds, the descriptors of the compounds were calculated, and a machine learning model was constructed in a Python environment (ver. 3.9). For model construction and compound visualization, Morgan fingerprint was performed using RDKit (ver. 2024.5.1) [5]. Demonstration of SHAP and its visualization were performed in Python. The details are as follows.
2.1 Dataset
The experimental data on CLint were obtained from a public database DruMAP [6]. Duplicate compounds were removed, and 4218 compounds were retained as datasets for this study. After discussing with medicinal chemistry researchers, the accuracy of the model must be above a certain level in order to discuss explainability because the explainability of a low-precision model is meaningless. Furthermore, since the practically useful threshold for drug discovery varies from phase to phase, the threshold was determined in consideration of the balance between the performance of the machine learning model made by the data set and its pharmacological implications. Therefore, the CLint class was divided into two classes. (Low (CLint ≦ 80 (μL/min/mg)): 3018 compounds, High (80 < CLint (μL/min/mg)): 1200 compounds)
The compounds in the dataset were clustered to separate the training and test sets. Random sampling is typically used to separate the data. However, we used a clustering algorithm to extract test compounds for which similar compounds were not included in the training set. To cluster the compounds, the clustering-molecule component was used in Pipeline Pilot under the following conditions: descriptor FCFP6 (functional connectivity fingerprint for all subgraphs with a diameter of up to size 6) [7], the average number of clusters was 10, MaxDistance was 0.65, and the metric between compounds was calculated by Euclidean distance. Following clustering, the compounds were separated into 27 classes. Cluster number 4 (39 compounds, including low 24 and high 15 compounds) was selected as the test set, and the compounds in the other clusters were defined as training sets. (4179 compounds, including low 2994 compounds and high 1185 compounds)
Table 1. Summary of datasets. Average and standard deviation (SD) of molecular weight and logP were calculated using RDKit
Dataset |
Number of compounds (Low / High) |
Molecular weight Average (SD) |
LogP Average (SD) |
---|---|---|---|
Training |
4179 (2994 / 1185) |
424.487 (91.434) |
3.650 (1.188) |
Test (cluster number 4) |
39 (24 / 15) |
470.292 (29.977) |
4.673 (0.873) |
2.2 Morgan fingerprint calculation and model construction
Morgan fingerprints with radial=2 and 4096 bits were used in RDKit for descriptor calculation. To demonstrate SHAP the decision tree algorithm uses the shap (ver. 0.45.0) module in Python [8]. Among the tree algorithms, random forest (RF) [9] is a high-performance and quick model construction. Therefore, the scikit-learn module [10] was used to construct the RF model.
RF is an ensemble model based on a decision tree algorithm that requires three parameters: n_estimators, the number of trees; max_depth, the maximum depth; and min_samples_split, the minimum number of samples required to split an internal node. The most proper parameters were evaluated using GridsearchCV (parameter: cv=5, score="accuracy"). Consequently, the selected parameters were determined as follows (Table 2). Based on these metrics, this model demonstrated a high recall for predicting low-CLint compounds but had difficulty accurately predicting high-CLint compounds. The aim of this study is to examine the explanatory potential of XAI, and it is not assumed that the model itself will be used in the field as is. Therefore, although a high prediction accuracy is desired, the model was deemed acceptable because of its ability to correctly infer low-CLint compounds in practice.
Table 2. Summary of Random Forest training
Parameter | Value (Values used for parameter tuning) |
---|---|
n_estimators | 500 (300, 400, 500, 600, 700) |
max_depth | 17 (14, 15, 16, 17, 18) |
min_sample_split | 8 (5, 6, 7, 8, 9) |
Performance | Score |
Best mean accuracy | 0.734 |
Standard deviation for CV | 0.019 |
Sensitivity (high-CLint) | 0.917 |
Sensitivity (low-CLint) | 0.067 |
Specificity (high-CLint) | 0.067 |
Specificity (low-CLint) | 0.917 |
Matthews correlation coefficient | -0.030 |
Area of under the curve | 0.492 |
F-1 score | 0.733 |
2.3 Calculation of SHAP
SHAP calculates the importance value of each feature for prediction based on game theory. This method is based on an additive model and the feature contribution is quantitatively calculated. The function, TreeExplainer [11] in the shap module of Python was used to calculate the SHAP values of the test compounds. Waterfall plots and the contribution of each sample were visualized.
3. Results and Discussion
3.1 Predictive Accuracy
The optimized model was used to predict the CLint classes of the test compounds, and the results are listed in Table 3. Using the confusion matrix, the accuracy of this model was 0.590, which was not a high score. This model tends to estimate compounds to be “low”-CLint. The training set seemed to have a few similar compounds to the test set because the dataset was separated by structure rather than randomly. However, the model had a high recall (0.917) for predicting low-CLint compounds, meaning it effectively selected low-CLint candidates.
Table 3. Confusion matrix for CLint classification
Predicted Class | |||
---|---|---|---|
Low | High | ||
Actual Class | Low | 22 | 2 |
High | 14 | 1 |
However, its accuracy for high-CLint classification was low, indicating that it struggled to correctly identify high-CLint compounds. The aim of this study is to examine the explanatory potential of XAI, and it is not assumed that the model itself will be used in the field as is. Therefore, although a high prediction accuracy is desired, the model was deemed acceptable because of its ability to correctly infer low compounds in practice.
3.2 SHAP value for all test compounds
The SHAP values for all test compounds were calculated and plotted as a summary plot (Figure 1). The high SHAP value in this model increased the probability of categorizing compounds as high-CLint. Fingerprint 3247 is associated with an increased probability of high-CLint, as indicated by its SHAP values. Similarly, the absence of fingerprints 3291 and 3802 is associated with a decreased probability of classification into the high-CLint category. These results suggest that the presence or absence of specific fingerprints can influence the model’s prediction of high-CLint compounds. Having many substructures (fingerprint=1) results in increased sites for metabolic enzymes to metabolize; therefore, a high feature value (fingerprint=1) tends to push up the SHAP value to categorize the compounds into the high-CLint class.
Figure 1. Contributions of descriptors to high-CLint classification
The x- and y-axes show the SHAP values and fingerprints, respectively. A high SHAP value (right side of the x-axis) tended to distinguish the compound with a high-CLint. The color bar shows the volume of the fingerprint; pink and blue indicate fingerprints = 1 and 0, respectively.
3.3 SHAP value for a target test compound and visualization of essential substructure
Low-CLint compounds are required for the development of drug-like compounds generally. Therefore, to examine the features that contribute to the output of the provability of distinguishment into high-CLint, the feature contribution of compound name 32759 (both observed and predicted classes were high) in the test set was represented as a SHAP value in Figure 2. In this figure, fingerprint 3247 is the feature that most strongly indicates the probability of a high-CLint class moving toward the output value. Fingerprints 3291 and 1066 push the probability to the high-CLint category.
Figure 2. Contributions of descriptors to high-CLint classification
The force plot shows the contribution of each fingerprint in distinguishing compound 32759 from the high-CLint class. Fingerprint 3247 significantly increased the output as follows: 3291 and 1066. The bold value, which indicates the probability that this compound is distinguished into a high class, implies that the model predicts this compound with low confidence.
The substructures estimated to make a significant contribution are shown in the structure. (Figure 3) The substructure of fingerprint 3247 is a part of the substructure consisting of atoms bonded to the central carbon atoms belonging to the benzene ring, within a radius of 2. This substructure pushes the output value to a high class; therefore, the structure is colored according to the SHAP value. SHAP value was converted using the following formula to determine the color. The substructure, which increases the probability of being a “high” class, was colored pink based on the converted SHAP color value (here, s represents SHAP values and max_shap means the max value of the shap score.)
converted SHAP color value =(1-sqrt(|s|))/(max_shap)
The substructures of high importance for estimating CLint were clarified by showing their structures. Furthermore, the essential substructures were revealed by varying the color intensity based on the SHAP values. The substructure of fingerprint 3247 indicates that it is likely to accelerate metabolism. When we checked compounds with fingerprint 3247 in the training data, many represented parts of the benzene rings in the training compounds. (Figure S1 in Supporting Information) Fingerprint 3247 represents a substructure of a benzene ring, so it is generally easily metabolized, and it is reasonable that fingerprint 3247 is found in compounds with high-CLint while it is also found in compounds with low-CLint, but there are many hydrophilic parts in the parts other than fingerprint 3247. Substructure 3247 represents a part of the benzene ring that could be added.
Next, we examined the compound with 1387 in the training data and found that it corresponded to multiple substructures. (Figure S2 in Supporting Information) Although fingerprints represent a variety of substructures including amines, quaternary carbons, and pyridines, it is difficult to infer the degree of metabolism. Each bit of Morgan fingerprints contains multiple substructures because they convolute bits of substructural information. Therefore, it is difficult to find a one-to-one substructure that improves CLint when the given fingerprint is 1. Therefore, to improve CLint, it is necessary to carefully check strongly lighted substructures. This model may facilitate the optimization of the compounds.
Figure 3. The substructures are estimated to contribute to distinguishing compound 32759 from high-CLint
The compounds shown in (a) and (b) are the same; compound ID is 32759 in the test set. Measured CLint is high (140 μL/min/mg), and predicted was collectly high (>80 μL/min/mg). The compounds in (a) are colored in substructures with 3247 bits in the Morgan fingerprint (a). The SHAP value was 0.122, and the substructure was highlighted with a vivid pink using the converted SHAP color value. The compound in (b) is shown as a substructure of fingerprint 1387. The substructure was weakly pink because the converted SHAP value was 0.01356. Based on this vividness, the contribution of fingerprint 3247 was higher than that of fingerprint 1387.
The compounds shown in (a) and (b) are the same; compound ID is 32759 in the test set. Measured CLint is high (140 μL/min/mg), and predicted was collectly high (>80 μL/min/mg). The compounds in (a) are colored in substructures with 3247 bits in the Morgan fingerprint (a). The SHAP value was 0.122, and the substructure was highlighted with a vivid pink using the converted SHAP color value. The compound in (b) is shown as a substructure of fingerprint 1387. The substructure was weakly pink because the converted SHAP value was 0.01356. Based on this vividness, the contribution of fingerprint 3247 was higher than that of fingerprint 1387.
4. Conclusion
The development of various ML packages has enabled the construction of in silico models for ADME predictions. However, many ML models have difficulty explaining the reasons for this output. Although essential features and contribution values can be estimated, it is difficult to determine the important substructures in a target compound. This makes it difficult to develop new drug seeds with improved ADME properties.
It is essential to demonstrate the contributions and highlight the critical substructures of these compounds. We visualized the SHAP values calculated using the random forest predictor. Substructures highlighted by SHAP, as well as strongly highlighted portions, may be helpful for structural optimization. However, the model’s ability to accurately classify high-CLint compounds is limited, indicating the need for further refinement in its predictive accuracy. This information should provide hints for further discussions on new compounds. It is anticipated that this report will help identify effective seeds and accelerate drug discovery.
Supplemental information
The datasets used in this study were as follows:
Acknowledgments
This work was partially supported by a grant for research on priority areas from Shiga University (2024) and JSPS KAKENHI (Grant Number 23K14382). We thank Editage (www.editage.jp) for English language editing.