2024 Volume 21 Issue 3 Article ID: e210021
Computerized molecular docking methodologies are pivotal in in-silico screening, a crucial facet of modern drug design. ChooseLD, a docking simulation software, combines structure- and ligand-based drug design methods with empirical scoring. Despite advancements in computerized molecular docking methodologies, there remains a gap in optimizing the predictive capabilities of docking simulation software. Accordingly, using the docking scores output by ChooseLD, we evaluated its performance in predicting the bioactivity of G-protein coupled receptor (GPCR) and kinase bioactivity, specifically focusing on Ki and IC50 values. We evaluated the accuracy of our algorithm through a comparative analysis using force-field-based predictions from AutoDock Vina. Our findings suggested that the modified ChooseLD could accurately predict the bioactivity, especially in scenarios with a substantial number of known ligands. These findings highlight the importance of selecting algorithms based on the characteristics of the prediction targets. Furthermore, addressing partial model fitting with database knowledge was demonstrated to be effective in overcoming this challenge. Overall, these findings contribute to the refinement and optimization of methodologies in computer-aided drug design, ultimately advancing the efficiency and reliability of in-silico screening processes.
The paper presents an evaluation of the accuracy of molecular docking, a crucial concern in in-silico screening. The fine-tuning of ChooseLD using registered data yielded enhanced accuracy, particularly in predicting Ki and IC50 values for G-protein coupled receptors and kinases in cases with numerous collected ligands, compared to that of AutoDock Vina, a force-field-based approach. This report provides insights into a suggested docking simulation approach, discovered through a comparative analysis of predictive data from ChooseLD and Vina.
In-silico screening, facilitated by computer-aided molecular docking methods, serves as an important technique in modern drug design. This approach is classified into ligand-based drug design (LBDD) and structure-based drug design (SBDD) [1]. LBDD involves the construction of drug structures based on known chemical information [2], whereas SBDD predicts ligand–protein interactions using three-dimensional structures [3]. Typically, these two methods are combined. ChooseLD [4,5, Supplementary Text S1, Table S1] was designed to evaluate the interaction between proteins and candidate compounds using SBDD methodologies. The analytical design of this approach is summarized in the Supplementary Text S1. In SBDD, three methods are established: force field-, empirical-, and knowledge-based designs [1]. Among these methods, ChooseLD predominantly employs empirical scoring, characterized by knowledge-based scoring. The processing speed of the empirical-based method is fast owing to its utilization of a simple empirical term; however, its estimation performance depends on the quality of the knowledge database.
Contemporary databases, such as ChEMBL [6], play a vital role in drug discovery research by providing mass-cleaned datasets. Docking simulation methodologies and the databases are progressing, although the predictive capabilities of molecular docking are not fully complete and the contributing factors are not fully understood. This study aimed to enhance the predictive capabilities of ChooseLD by tuning its prediction parameters using ChEMBL-provided data. Subsequently, the obtained calculated scores were compared to those of AutoDock Vina (version 1.1.2) [7] (subsequently denoted as “Vina”), leveraging a force field-based scoring methodology as the main algorithm in SBDD. This comparative analysis revealed that the learning model established using modified ChooseLD was efficacious in predicting bioactivity data associated with proteins with numerous experimentally analyzed ligands in homologous protein structures.
The Assay IDs registered as curated by an expert and with a confidence score of 9 were obtained from the ChEMBL31 database on the ChEMBL website [8]. Based on the obtained Assay IDs, compound structures in the Simplified Molecular Input Line Entry System (SMILES) format with experimental Ki and IC50 values were extracted. Simultaneously, the amino-acid sequences of target proteins corresponding to each Assay ID were extracted without any additional conditions. Each protein could have an individual Assay ID, and multiple IC50 and Ki values can be obtained for the same compound. In the present study, all extracted data were used to treat each “protein-assay-affinity value” as a single dataset. The obtained SMILES data were converted into MOL files with 3D structures using RDKit [9], employing the Universal Force Field (UFF) [10] as the force field. Because ChEMBL only provides information on the amino acid sequences of target proteins, the 3D structures could not be obtained directly. Therefore, homologous protein structures corresponding to the constructed data were identified through BLAST searches [11] based on the Protein Data Bank (PDB) [12] (as of August 12, 2022). The protein–ligand complex structure with the highest BLAST score was selected as the base template for homology modeling. Subsequently, the predicted ligand structure and/or location were fitted to the active site of the base protein template structure employing the protein–ligand 3D location information of secondary or subsequent BLAST outputs (subsequently denoted as “PDB ligands”). During structural fitting, the least-squares fitting was implemented using CEAlign from PyMOL [13], and estimated ligand models in contact with proteins outside the active site or penetrating the template structure were removed. Subsequently, we conducted a 3D homology modeling of the protein structure with ligands in FAMS [14] guided by the local alignment results between the highest-scoring protein sequence from protein BLAST and amino acid sequences from ChEMBL. FAMS is a homology modeling method characterized by iterative database searches and simulated annealing. Modeling with FAMS ensures that PDB ligands do not overlap spatially with the modeled protein.
In-silico Screening Based on Empirical ScoresThe ChooseLD score, total score for docking evaluation, contains three outputs: BaseScore, Overlap, and Hc_index, expressed as Equation (1).
| (1) |
First, the pharmaceutical candidate compound is moved to the docking position by performing singular value decomposition (SVD) using a randomly selected set of coordinates from the common MACCS fingerprints of PDB ligands and pharmaceutical candidate compounds. A single docking score is calculated using the coordinates. BaseScore exhibits higher values when pharmaceutical candidate compounds are situated in proximity to PDB ligands, as illustrated in the score calculation schema model depicted in Figure 1. Conversely, overlap represents a score that decreases when pharmaceutical candidate compounds are embedded within the protein model structure and increases when they are positioned proximal to the surface (Figure 1). The definitions for the inside and the surface of the proteins were determined based on the Chemistry Data Booklet. Details are described in the Supplementary Text S1.

The Hc_index is calculated based on the AlogP value, which indicates the hydrophobicity strength of atoms within the protein, as determined using RDKit. This estimation considers the effect of the surrounding space of hydrophobic atoms (Figure 1). Equation (2), determined based on a previous study [5], can be used to calculate the hydrophobic effect of a single atom within a pharmaceutical compound. AlogP, instead of logP, was used to perform calculations for each atom of the target protein and candidate compounds. In Equation (2), “r” represents the distance between the i-th atom of the ligand molecule and the j-th atom of the receptor molecule, respectively. Equation (2) is shown as follows:
| (2) |
where U and r(i,j) represent the neighbor of the i-th atom of the ligand molecule and the distance between the i-th atom of ligand molecule and the j-th atom of the receptor molecule, respectively. In the present study, “rth” was set to 10 Å. Extending this calculation to the entire molecule, the Hc_index was obtained as a cumulative sum of the hydrophobicity indices for all atoms, as expressed in Equation (3), and was numerically quantified to assess the overall hydrophobic interaction strength of the molecule. Thus, Hc_index is expressed as follows:
| (3) |
ChooseLD has three parameters: k1 to k3. In this case, we used values that were tuned in a previous study (Supplementary Text S1) for the entire ChEMBL28 dataset, specifically 1.0 for k1, 15.0 for k2, and 100.0 for k3. Since 90% of the ChEMBL31 dataset is composed of data from the ChEMBL28 dataset, tuning parameters were assumed to be similar and used without further modification. A more detailed explanation can be found in the Supplementary Text S1. Additionally, ChooseLD is designed to operate with an execution time of approximately 1 min, including data preparation such as retrieving PDB ligands, making it almost the same as Vina in terms of execution time.
Machine Learning of Score and Bioactivity DataTo convert mass bioactivity data of drug candidate compounds into a machine-learning-compatible format, we converted the Ki and IC50 values from nanomolar units to integer values on a logarithmic scale with a base of 10 (subsequently denoted as “experimental values”). These experimental values are standardized to a uniform scale, such as pIC50 and pKi, and are represented as shown in Equations (4) and (5). The range of pIC50 and pKi values was 6–14, but values close to 0 are desirable for training our machine learning model. In the present study, we introduced qIC50 and qKi Equations (6) and (7) instead of typical pIC50 and pKi values. The difference between typical pIC50/pKi values and our qIC50/qKi values are that the signs are reversed, and the units are expressed in nM instead of M.
| (4) |
| (5) |
| (6) |
| (7) |
In the following texts, qIC50exptl/qKiexptl and qIC50calc/qKicalc represent the experimental and predicted values, respectively. Three scores were used as training factors, namely BaseScore, Overlap, and Hc_index, obtained from ChooseLD along with the logarithm of the bioactivity data. The predictive model was constructed using scikit-learn (version 0.23.2) [15], specifically employing its Support Vector Classifier (SVC) algorithm within the Support Vector Machine (SVM) framework, with a five-fold cross-validation-supervised learning approach. To determine the optimal hyperparameters for the model, we applied the grid search method using the GridSearchCV object. After identifying the best parameters, the final model was retrained using the entire dataset. The regularization parameter C was explored within the range [0.1, 1, 10, 100], the kernel coefficient γ within [1, 0.1, 0.01, 0.001], and the types of kernels considered were [“Linear Kernel,” “Radial Basis Function Kernel,” and “Sigmoid Kernel”]. Gamma parameter is not used in linear kernel models as it is not an included option. The SVM models in ChooseLD are calculated using the “Linear Kernel,” “RBF Kernel,” and “Sigmoid Kernel” as shown in Equations (8), (9), and (10).
| (8) |
| (9) |
| (10) |
Furthermore, a similar learning model was constructed using the calculated scores of Vina, and the results of both models were compared. As Vina was reported to yield better prediction compared to Autodock 4 [16], we used the output of Vina, which includes binding affinity energies encompassing physical interactions such as electrostatic interactions [7]. As Vina score cannot be output separately by type but only combined, the results were used in this study as training data. The Vina score calculation involved a 50×50×50 grid (--size x, y, z) centered on the centroid coordinates of the PDB ligand (--center x, y, z), as obtained from CEAlign. The large grid size is used to explore the entire protein because it is not known where the pharmaceutical candidate compound will bind. Compounds whose counterparts are G-protein coupled receptors (GPCRs) or kinases possessing available qKiexptl values and qIC50exptl assay data from ChEMBL were selected for learning and binding prediction. The biological names of the GPCRs and kinases used in the analysis are listed in Supplementary Tables S2 and S3. Molregno, compound ID within ChEMBL, are present in large quantities; therefore, a dedicated website (http://www.bio.chuo-u.ac.jp/iwadate/DrugSearch/) has been set up to list them.
Predicted Data Analysis and StatisticsOnly GPCRs and kinases were included in this study to circumvent the challenges of correctly identifying the activity pocket and to mitigate potential inaccuracies in predictions associated with enzymes featuring cofactors and/or associated with small molecules. The accuracy rate of each method was determined by dividing the number of matched data with another model by the total output number. To assess the differences in the results of ChooseLD and Vina prediction, the root mean square deviation (RMSD) of the maximum common substructure between the PDB ligand and that of the predicted structure from each tool were calculated. Statistical comparison of the two grouped RMSD values was performed using a two-sided Mann-Whitney U-test. As the PDB ligands selected for each PDB ligand have different maximum common substructures using RDKit, we calculated the distances between the center-of-mass coordinates of each PDB ligand and those of the predicted structures from each tool as a common indicator and performed similar tests. Cliff’s delta (d) was also calculated as an effect size for statistical criterion owing to the large sample size. The tests focused on the differences in protein-PDB ligand distances between the predicted ChooseLD and Vina results or prediction success and failure groups. The distance data were grouped under the following conditions: the prediction tools grouping (ChooseLD and Vina) and Vina prediction accuracy (success or fail). Statistical significance was set at p<0.05. Clliff’s delta was interpreted as follows: negligible (|d|<0.147), small (|d|<0.33), medium (|d|<0.474), and large (|d|>0.474) according to a previous study [17], and values in the small, medium, and large categories were considered statistically significant. Subsequently, we extracted the qIC50calc and qKicalc values for GPCR and kinase data corresponding to the cases where the predictions of Vina matched the experimental values. The cases wherein the difference between the predicted and experimental values by ChooseLD exceeded 1 were defined as “failure.” These instances were counted and calculated as a percentage. The number and distribution of ligands collected per target protein were analyzed.
The prediction accuracy of each tool is presented in Figure 2. Both Vina and ChooseLD exhibited better accuracy than random prediction. Notably, ChooseLD demonstrated considerably higher prediction accuracy for kinase qKiexptl values. For Vina, a marked difference with random prediction in qIC50exptl estimation was observed, and the gap in prediction accuracy with ChooseLD appeared narrow.

We conducted a comparative assessment of the two qKi values for GPCR which had the largest cases to analyze the correlation of predicted experimental values with qKiexptl. The comparison between ChooseLD and Vina results is summarized in Table 1 and Table 2, respectively. Both Vina and ChooseLD results exhibited inaccuracies in predicted values below –1 and above 5. A notable difference was observed in the output range between the two software packages, and ChooseLD could predict across a broader range. Comparing concordance rates, Vina achieved more correct predictions for qKi value of 0 (ChooseLD: 2,982, Vina: 3,373), while ChooseLD outperformed Vina in predicting values other than 0. The analysis of the quantitative distribution of ChooseLD predictions revealed peaks in qKiexptl range of 0–2 (blue, green, and yellow lines in Figure 2). Vina predictions exhibited a higher frequency of 0 or 1 values (Figure 2). A similar result was observed for qIC50 values. The final scoring equations for ChooseLD and Vina, as a result of the training, are provided in Supplementary Tables S4 and S5. As the RBF kernel was used for all learning models, all subsequent data presented reflect the results obtained using the RBF kernel.
| Prediction | qKiexptl | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| –3 | –2 | –1 | 0 | 1 | 2 | 3 | 4 | >4 | |
| –1 | 0 | 0 | 0 | 2 | 0 | 1 | 0 | 0 | 0 |
| 0 | 0 | 4 | 93 | 2982 | 2453 | 2261 | 1265 | 180 | 30 |
| 1 | 0 | 2 | 63 | 2196 | 2623 | 2207 | 1249 | 242 | 30 |
| 2 | 1 | 4 | 71 | 2009 | 2181 | 2585 | 1417 | 267 | 36 |
| 3 | 0 | 0 | 10 | 185 | 193 | 264 | 289 | 81 | 12 |
| 4 | 0 | 0 | 0 | 9 | 13 | 8 | 20 | 16 | 1 |
Rows represent values predicted by ChooseLD, whereas columns represent qKiexptl. Areas where the qKiexptl and predicted values match are highlighted in gray. The table provides a summary of the results of multiple affinity prediction results for the qKiexptl values of GPCRs. GPCR, G protein-coupled receptor; Ki: inhibition constant.
| Prediction | qKiexptl | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| –3 | –2 | –1 | 0 | 1 | 2 | 3 | 4 | 4< | |
| –1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0 | 1 | 4 | 91 | 3373 | 2998 | 2903 | 1486 | 213 | 42 |
| 1 | 0 | 5 | 74 | 2129 | 2317 | 2399 | 1452 | 360 | 39 |
| 2 | 0 | 1 | 71 | 1874 | 2124 | 2004 | 1283 | 204 | 26 |
| 3 | 0 | 0 | 1 | 7 | 24 | 20 | 19 | 9 | 1 |
| 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Rows represent predicted values by Vina, whereas columns represent qKiexptl. Areas where the qKiexptl and predicted values match are highlighted in gray. Dark gray-colored values indicate instances where the predicted output was present in ChooseLD but did not return a predicted value in Vina. The table provides a summary of the results of multiple affinity prediction results for qKiexptl values of GPCRs. GPCR, G protein-coupled receptor; Ki: inhibition constant.
Figure 3 display instances where ChooseLD predicted ligand-GPCR binding more accurately than Vina. For example, ChooseLD accurately predicted a score of 0, whereas Vina inaccurately predicted a score of 3 in cases where qKiexptl value was 0. In these cases, the structures predicted by ChooseLD exhibited spatial overlap with the PDB ligands, whereas those predicted by Vina were distant from the PDB ligands. Figure 3 depict cases where Vina could accurately predict the structure. In these cases, the structures predicted by ChooseLD did not overlap, and predicted models were located in PDB ligands that were not spatially well-aligned where the experimental value was 0, whereas Vina accurately predicted a score of 0 (incorrectly predicted as 3 by ChooseLD). In contrast, the structures predicted by Vina were spatially closer to the PDB ligands.

To analyze causal factors associated with docking model difference, the RMSD of the maximum common substructure using RDKit between the PDB ligand and that of the predicted structure from each tool were calculated and statistically compared (Table 3 and Table 4). We also included the results obtained from analyzing the distances between the center-of-mass coordinates (Table 5 and Table 6). ChooseLD output was significantly closer to the PDB ligands than the predicted results of Vina for the experimental value 0 data, which was the same experimental value used in the analysis resulting in Figure 3 (Table 3 and Table 5). Similar results were observed for the other experimental values, suggesting that ChooseLD tended to predict the structure approaching PDB ligands at close distance (Supplementary Tables S6–S15). Furthermore, differences were also observed in the distance between Vina output-PDB ligands in cases where Vina made successful predictions or not. Distances were significantly smaller in cases where Vina made successful predictions that corresponded to the structure accurately predicted to bind the PDB ligand, particularly when considering qIC50calc values (Table 4 and Table 6 and Supplementary Tables S16–S26). “Success” and “Failure” refer to whether Vina successfully predicted the digit of binding-affinity value. Note that this does not indicate whether the docking position was a “Success” or “Failure.”
| GPCR | |||||||
|---|---|---|---|---|---|---|---|
| Activity type | Number of data | Vina median (Å) | Vina range (Å) | ChooseLD median (Å) | ChooseLD range (Å) | p-value | d |
| qKicalc | 7329 | 5.311 | 0.888–30.792 | 4.269 | 0.350–17.275 | 0.001> | 0.337 |
| qIC50 calc | 3050 | 5.962 | 1.275–33.201 | 5.095 | 0.819–21.676 | 0.001> | 0.228 |
| Kinase | |||||||
| Activity type | Number of data | Vina median (Å) | Vina range (Å) | ChooseLD median (Å) | ChooseLD range (Å) | p-value | d |
| qKicalc | 160 | 4.89 | 1.685–28.989 | 2.906 | 0.991–14.807 | 0.001> | 0.650 |
| qIC50 calc | 1473 | 5.619 | 1.816–31.017 | 3.296 | 0.666–19.272 | 0.001> | 0.574 |
Rows represent predicted GPCRs and kinases at qKicalc and qIC50calc values, respectively. Columns represent activity type, the number of the data, median distance with range from each tool output, p-value calculate using Mann-Whitney’s U-test, and Cliff’s delta (d). RMSD, root mean square deviation; PDB: Protein Data Bank; GPCR, G protein-coupled receptor; Ki: inhibition constant; IC50: half-maximal inhibitory concentration.
| GPCR | ||||||||
|---|---|---|---|---|---|---|---|---|
| Activity type | NTrue a | NFail b | NTrue median (Å) | NTrue range (Å) | NFail median (Å) | NFail range (Å) | p-value | d |
| qKicalc | 3336 | 3993 | 5.311 | 1.458–24.568 | 5.311 | 0.888–30.792 | 0.170 | –0.019 |
| qIC50 calc | 711 | 2339 | 5.61 | 1.275–23.408 | 6.051 | 1.370–33.201 | 0.001> | –0.155 |
| Kinase | ||||||||
| Activity type | NTrue a | NFail b | NTrue median (Å) | NTrue range (Å) | NFail median (Å) | NFail range (Å) | p-value | d |
| qKicalc | 0 | 160 | N.A. | N.A. | 4.89 | 1.685–28.989 | N.A. | N.A. |
| qIC50 calc | 2 | 1471 | 4.663 | 4.487–4.839 | 5.624 | 1.816–31.017 | 0.377 | –0.362 |
Rows represent predicted GPCRs and kinases at qKicalc and qIC50calc values, respectively. Columns represent activity type, the number of the data, median distance with range from each case, p-value calculated using Mann-Whitney’s U-test, and Cliff’s delta (d). a: total number of successfully predicted compounds; b: total number of unsuccessfully predicted compounds. Kinase_qKicalc values were unavailable because the corresponding data did not exist (shown as N.A., not available). RMSD, root mean square deviation; PDB: Protein Data Bank; GPCR, G protein-coupled receptor; Ki: inhibition constant; IC50: half-maximal inhibitory concentration.
| GPCR | |||||||
|---|---|---|---|---|---|---|---|
| Activity type | Number of data | Vina median (Å) | Vina range (Å) | ChooseLD median (Å) | ChooseLD range (Å) | p-value | d |
| qKicalc | 7383 | 2.741 | 0.041–28.374 | 1.412 | 0.031–14.744 | 0.001> | 0.480 |
| qIC50 calc | 3080 | 3.374 | 0.164–31.694 | 1.698 | 0.040–15.834 | 0.001> | 0.439 |
| Kinase | |||||||
| Activity type | Number of data | Vina median (Å) | Vina range (Å) | ChooseLD median (Å) | ChooseLD range (Å) | p-value | d |
| qKicalc | 160 | 4.593 | 0.186–25.513 | 0.606 | 0.071–9.998 | 0.001> | 0.843 |
| qIC50 calc | 1473 | 3.608 | 0.079–28.909 | 1.037 | 0.033–13.507 | 0.001> | 0.546 |
Rows represent predicted GPCRs and kinases at qKicalc and qIC50calc values, respectively. Columns represent activity type, the number of the data, median distance with range from each tool output, p-value calculated using Mann-Whitney’s U-test, and Cliff’s delta (d). Protein Data Bank; GPCR, G protein-coupled receptor; Ki: inhibition constant; IC50: half-maximal inhibitory concentration.
| GPCR | ||||||||
|---|---|---|---|---|---|---|---|---|
| Activity type | NTrue a | NFail b | NTrue median (Å) | NTrue range (Å) | NFail median (Å) | NFail range (Å) | p-value | d |
| qKicalc | 3373 | 4010 | 2.452 | 0.057–25.725 | 2.966 | 0.041–28.374 | 0.001> | –0.162 |
| qIC50 calc | 713 | 2367 | 2.490 | 0.194–23.707 | 3.604 | 0.164–31.694 | 0.001> | –0.291 |
| Kinase | ||||||||
| Activity type | NTrue a | NFail b | NTrue median (Å) | NTrue range (Å) | NFail median (Å) | NFail range (Å) | p-value | d |
| qKicalc | 0 | 160 | N.A. | N.A. | 4.593 | 0.186–25.513 | N.A. | N.A. |
| qIC50 calc | 2 | 1471 | 1.513 | 1.469–1.558 | 3.618 | 0.079–28.909 | 0.054 | –0.789 |
Rows represent predicted GPCRs and kinases at qKicalc and qIC50calc values, respectively. Columns represent activity type, the number of the data, median distance with range from each case, p-value calculated using Mann-Whitney’s U-test, and Cliff’s delta (d). a: total number of successfully predicted compounds; b: total number of unsuccessfully predicted compounds. Kinase_qKicalc values were unavailable because the corresponding data did not exist (shown as N.A., not available). PDB: Protein Data Bank; GPCR, G protein-coupled receptor; Ki: inhibition constant; IC50: half-maximal inhibitory concentration.
To identify the factors that negatively affected the predictions of ChooseLD, we analyzed the characteristics of data in cases where Vina made accurate predictions. Kinase prediction exhibited a markedly lower failure rate than that of GPCR, and qIC50calc values demonstrated a lower incorrect rate than qKicalc values (Table 7).
| qKicalc | qIC50 calc | |
|---|---|---|
| GPCR | 6.15 | 4.51 |
| Kinase | 1.83 | 1.06 |
Failure cases were identified as instances where Vina predicted accurately and the difference between predicted and experimental values of ChooseLD exceeded 1 for qKi and qIC50 values in both GPCRs and kinases. All values are presented as percentages (%).
To assess this difference, the number and distribution of ligands collected per target protein were analyzed (Figure 4). The ligand count for kinases reached up to approximately 30 PDB ligands, whereas that of GPCRs was up to 21 ligands. This observation suggests that the number of ligands in homologous proteins of GPCRs tends to be smaller than that of kinases.

In this study, we established the predictive model of the logarithm of the bioactivity data (qKiexptl and qIC50exptl values) for GPCRs and kinases using a learning model based on ChooseLD, which incorporates BaseScore, Overlap, and Hc_index, as well as another learning model that utilizes the Vina score. Notably, this model exhibited more effective learning than the model using random scores. Differences in predictive accuracy were observed between GPCRs and kinases, specifically the each qKicalc and qIC50calc values. Concordantly, analysis of the correlation between the predicted and experimental values revealed peaks at positions where ChooseLD predictions aligned with the experimental values (Figure 2). This observation suggests that the three ChooseLD scores provided highly accurate and precise predictions. However, the peak positions did not coincide when the experimental values were below –1 and exceeded 2 (Figure 2), indicating increased prediction difficulty for experimental values outside the range of 0–2. This trend was more pronounced for Vina. In virtual screening, it is crucial to preferentially select compounds that have or do not have activity. However, the current method does not provide a model that could distinguish between the presence and absence of activity. In this study, we classified the experimental values based on their orders of magnitude for training and treated all data as “active.” However, we suspect that treating data with values greater than 4 digits as “inactive” and training the model to differentiate between “active” and “inactive” compounds would lead to a more virtual screening-oriented learning model.
In instances where Vina predictions were accurate, and ChooseLD encountered failure, cases were identified where multiple ligands exhibited spatial overlap in a disordered manner (Figure 3). The elevated failure rate in GPCR is attributed to insufficient ligand information for accurate empirical method prediction (Figure 4). The Vina prediction process involves grid exploration from the initial placement [7], independent of the reduction in the number of ligands. This stands in contrast to the structure prediction by ChooseLD, which relies on more ligand information. Consequently, Vina yielded relatively accurate results in these instances. Based on these insights, three major factors were considered to be associated with the failure rate of ChooseLD: the spatial ordering of PDB ligands, the adequacy of the number of PDB ligands, and the nature of the prediction target (qKiexptl or qIC50exptl values).
In this study, the spatial orderliness of PDB ligands was predicted based on the homology of amino acid sequences using BLAST. Despite high homology between proteins, variations in microstructure and the binding site environment among individual proteins could lead to mispredictions. Therefore, improving molecular positioning through targeted repositioning is pivotal for accurate structure prediction. To address this challenge, we propose transitioning to a selective fitting method that references a single ligand with high accuracy instead of multiple ligands. Subsequently, docking can be performed using the optimal conformation obtained from force-field-based docking software, such as Vina and Rosetta [18]. This modification is speculated to be particularly effective for ChooseLD, with careful tuning, in cases where the empirical scoring method fails (Figure 3). In contrast, the modified ChooseLD established in this study exhibited sufficient performance for cases with well-aligned PDB ligands (Figure 3) and similar instances.
The number of collected PDB ligands is a factor required for accurate prediction. Our results suggest that a small number of collected ligands passively decreased structural predictability, as evidenced by the lower failure rate for kinases when a larger number of ligands was obtained than that for GPCRs (Table 7 and Figure 4). On the condition that the number of PDB ligands is shortened and/or the size of the drug candidate compounds is remarkably smaller than the pocket size, the exploration space could restrict unintentionally obtained inadequate prediction results. To address this limitation, broadening the range of obtaining PDB ligands from BLAST search results may be preferable. In cases where obtaining sufficient ligands is challenging, force field-based docking tools exemplified by Vina could represent a more favorable approach.
In conclusion, understanding the differences in the characteristics of qKiexptl and qIC50exptl values can provide valuable insights for result interpretation. Our results suggest that ChooseLD exhibited a higher failure rate in predicting qKiexptl than qIC50exptl values. qKiexptl values indicate the binding strength between an enzyme and an inhibitor, whereas qIC50exptl represents the inhibition potential (as a measure of drug amount) obtained from biochemical assays, reflecting experimentally different properties [19]. In this study, ChooseLD demonstrated better predictive performance for qIC50calc than for qKicalc (Table 7), suggesting that the empirical scoring based on PDB ligand collection may partially contain information about the enzyme-catalyzed reaction, which is the target of inhibition. A plausible factor contributing to these observations is the strength of physical interactions. In the present process, predictions were solely based on the structure without considering other factors associated with protein–ligand binding (or atom-atom bonds). However, the molecular interaction is affected by the molecular competition of ligands or substrate saturation in the physiological micro-environment, representing a potential limitation in the present prediction paradigm [20,21]. Therefore, we speculated that incorporating physical interactions that are currently not primarily considered by ChooseLD could potentially enhance the accuracy of qKicalc, bringing them closer to that of qIC50calc.
This study focused on GPCRs and kinases for high-throughput execution without extending our method to other types of proteins owing to complexities associated with the spatial placement of ligands. While overcoming the above limitations in handling large-volume data with a high-throughput approach remains difficult, we have identified some potential improvements. In addition, through the analysis of the differences between ChooseLD and Vina outputs, we clarified the potential advantage of machine learning in docking simulation. Further comparison analysis and evaluation of machine-learning-based methods, including our tools, are required to better understand the advantages offered by each learning method. This approach is our next objective for predicting ligand affinity for proteins in future studies. The method established in this study may have broader applicability with further tunings, extending beyond GPCRs and kinases to other types of proteins. Moreover, addressing the limitations revealed in this study regarding the ligand data selection process could pave the way for high-throughput execution.
ChooseLD, which employs empirical scoring, is recommended when a sufficient number of spatially ordered PDB ligands are available. Software leveraging force-field-based screening, such as Vina, holds an advantage in predicting proteins with limited available ligand information or ligands exhibiting poor spatial orderliness. The use of the learning model of ChooseLD that leverages empirical scoring is expected to be beneficial for targets extending beyond GPCRs and kinases, provided there is sufficient database information on proteins and ligands with proper selection.
All authors declare that they have no conflict of interest.
M.I. conceived the study. A.M. designed and conducted the computational study, analyzed the data. D.S and M.I supervised the study. A.M. and D.S. wrote the manuscript. All authors approved the final version of the manuscript.
The data supporting the findings of this study are available within the article. More detailed data are available on the data repository website (http://www.bio.chuo-u.ac.jp/iwadate/DrugSearch/).
This research was supported by a grant from the Chuo University Basic Research Fund. We would like to thank Editage (www.editage.com) for English language editing.