2021 Volume 21 Pages 70-80
The inhibition of hERG potassium channel is closely related to the prolonged QT interval, and to assess the risk could greatly contribute to the development of safer therapeutic compounds. In the hit-to-lead optimization stage of drug development, quantitative prediction of hERG inhibitory activity is crucial to design drug candidates without cardiotoxicity risk. Here, we developed a hERG regression model combining support vector regression (SVR) and descriptor selection by non-dominated sorting genetic algorithm (NSGA-II) based on AMED cardiotoxicity database consisting of hERG blocking information built by integrating public and commercial databases. To construct a regression model, 6,561 compounds with IC50 and/or Ki values were derived from AMED cardiotoxicity database, and randomly separated into training set (70%) for model building and test set (30%) for performance evaluation. To avoid overfitting by employing many non-relevant explanatory variables, NSGA-II, a variation of genetic algorithm for multiple objective optimization, was used for descriptor selection in order to maximize Q2 and minimize RMSE in 5-fold cross validation and minimize the number of used descriptors spontaneously. The prediction performance was then compared to those of ADMET predictor, commercial software providing various ADMET property predictions. The SVR model recorded R2 of 0.594 and RMSE of 0.604 for test set, clearly exceeding those of ADMET predictor (0.134 and 0.690, respectively). The regression model is available at our home page (https://drugdesign.riken.jp/hERG).