Chem-Bio Informatics Journal
Online ISSN : 1347-0442
Print ISSN : 1347-6297
ISSN-L : 1347-0442
calculation report
ChEMBL-Derived Benchmark Dataset and Computational Results of Boltz-2-Based Binding Affinity Prediction
Yugo ShimizuTatsuki AkabaneMasateru OhtaTeruki HonmaKazuyoshi Ikeda
Author information
JOURNAL OPEN ACCESS FULL-TEXT HTML
Supplementary material

2026 Volume 26 Pages 11-27

Details
Abstract

Accurate prediction of protein–ligand binding affinities is a critical step in accelerating drug discovery by reducing experimental costs and development times. Recently developed co-folding AI models predict how multiple biomolecules fold and interact with each other in three-dimensional space. The emergence of a new co-folding model, Boltz-2, has made highly accurate and efficient predictions of protein–ligand binding affinities increasingly feasible. However, the generalization and reliability of these models remain unclear due to the absence of standardized and target-wide benchmark datasets. In this study, we constructed an independent external benchmark dataset derived from ChEMBL version 35 to rigorously evaluate Boltz-2’s performance for affinity prediction. The dataset includes 356 unique protein targets and 10,933 compounds, carefully collected to ensure no overlap with the Boltz-2 training data. Binding affinity measurements were standardized into pChEMBL values and linked to the compound SMILES and protein UniProt accessions. Using this benchmark dataset, we compared the performance of the original Boltz-2 model and its NVIDIA Inference Microservice (NIM) implementation. The results showed that the original Boltz-2 and NIM achieved comparable and fair predictive performance across targets (mean absolute error of approximately 0.9), while NIM reduced computational time by approximately 60–90%. The error analysis indicated that no clear correlation existed between the prediction errors and sequence or compound novelty relative to the Boltz-2 training data, underscoring the model’s broad coverage. This work provides a transparent and reproducible benchmark for evaluating AI-driven affinity prediction models and offers valuable insights into Boltz-2’s applicability, limitations, and potential as a practical tool for data-driven drug discovery.

Content from these authors
International (CC BY 4.0) : The images, videos or other third party material in this article are also included in the article’s Creative Commons license.To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

この記事はクリエイティブ・コモンズ [表示 4.0 国際]ライセンスの下に提供されています。
https://creativecommons.org/licenses/by/4.0/deed.ja
Previous article
feedback
Top