Biological and Pharmaceutical Bulletin
Online ISSN : 1347-5215
Print ISSN : 0918-6158
ISSN-L : 0918-6158
Review
Development of a Data-Driven Prediction Model of Adverse Drug Reactions Using Large-Scale Medical Information and Machine Learning
Kaori Ambe
著者情報
ジャーナル オープンアクセス HTML

2026 年 49 巻 2 号 p. 213-219

詳細
Abstract

In the development of pharmaceuticals and other chemical substances, it is important to evaluate their efficacy and safety. There is a growing trend toward reducing reliance on traditional in vivo testing using animals for safety assessments and utilizing new evaluation methods, such as in vitro and in silico testing, to refine human safety assessments. Furthermore, in medical and environmental fields, there is a growing demand for the utilization of vast amounts of information. This has led to the development of data-driven approaches that utilize large-scale medical information and artificial intelligence (AI). Machine learning enables computers to learn from known data, discover new patterns, and predict unknown data. This technology is also useful for in silico prediction of chemical toxicity and adverse reactions in humans. Recently, explainable AI, which presents the basis for forecasts obtained from machine learning models in a user-understandable manner, has attracted attention and is a useful technology for decision-making support. We have developed machine learning models focusing on a quantitative structure–activity relationship approach to predict toxicity and adverse reactions based on the structural information of chemical substances. Furthermore, we have begun to develop a model to predict package insert revisions based on post-marketing adverse reaction information. These efforts will contribute to solving regulatory science issues regarding the appropriate use of chemical substances such as pharmaceuticals.

1. INTRODUCTION

In drug development, new candidate substances based on diverse modalities continue to emerge as needs diversify with scientific and technological advances. However, safety issues remain a major cause for the termination of clinical trials and market withdrawal.1) Traditionally, chemical safety assessments are conducted using experimental animals for toxicity testing. However, owing to concerns regarding animal welfare, cost, and time, interest in new safety assessment methods that do not rely on animal testing has increased. Traditional animal testing models often fail to accurately replicate human efficacy and safety and have been criticized for their limited predictability. Recent advances in science and technology have led to the development of advanced in vitro models using human-derived cells and tissues (e.g., organoid technology, organ-on-a-chip, and three-dimensional-culture systems) and in silico computer-based models (e.g., machine learning and mathematical models), making it possible to replicate human biological responses.2)

Therefore, there is a growing need to evaluate the safety of chemicals without animal models. Evaluation methods that do not use animal testing, known as New Approach Methodologies (NAMs), have the potential to predict the effects of chemicals on humans more accurately. They are also expected to contribute to reducing costs and time and improving animal welfare. In the cosmetics industry, animal testing for cosmetic development has already been banned in Europe since 2013, leading to the development and practical application of alternative methods.3) In drug development, the U.S. Food and Drug Administration (FDA) clarified in 2022 through the “Food and Drug Modernization Act 2.0” that non-animal testing data will be possible for drug clinical trial applications.4) Furthermore, in April 2025, it announced a roadmap to promote the use of NAMs and gradually eliminate animal testing requirements for certain drugs such as monoclonal antibodies.5) Although chemical safety assessments are transforming, in vitro and in silico NAMs each have their own technical limitations and challenges remain, such as difficulty in reproducing certain toxicity parameters and ensuring the reliability of prediction results. Therefore, toxicity prediction approaches that utilize Artificial Intelligence (AI) and toxicity-related ‘big data’ are highly anticipated. Machine learning enables predictions based on new data by effectively utilizing known data and is therefore expected to contribute to the efficiency of experimental research as a data-driven toxicity prediction method.6,7)

From a regulatory science perspective, I am working on developing in silico approaches that could contribute to the efficiency of safety assessments by prioritizing animal testing and assessing the risk of impurities that are difficult to synthesize (Fig. 1). We focused on a (quantitative) structure–activity relationship ((Q)SAR) approach, which predicts toxicity based on chemical structure information and developed a machine learning prediction model. For a wide variety of chemicals, the chemical structure information was quantified using molecular descriptor calculation software as explanatory variables for the predictive model and the target variable, toxicity, was determined using a machine learning algorithm. This makes it possible to predict toxicity if chemical structure information is available.8) We also applied the (Q)SAR approach to develop an adverse drug reaction (ADR) prediction model that utilizes large-scale medical information and machine learning. Our findings suggest that combining large-scale post-market adverse reaction information with machine learning could provide a new approach to drug safety research.9) Furthermore, we expanded the ADR prediction model and developed a predictive model to support package insert revisions.10,11) Using the number and percentage of reported adverse reactions in a drug adverse reaction database, this model is expected to serve as a decision-making support tool for safety measures based on accumulated data.

Fig. 1. Combining Regulatory Science and Data Science

I am working to develop new approaches to chemical safety research that combine the fundamentals of pharmaceutical research (organic chemistry, toxicology, and drug informatics) with the latest machine learning techniques.

Recently, explainable AI, which provides the basis for predictions obtained from machine learning models in a manner that users can understand, has also attracted attention as a useful technology for decision-making support. Furthermore, to utilize the appropriate information for each issue from the vast amount of information updated daily in the medical and health fields, data science is becoming increasingly important, as it uses statistics and computer science to discover meaningful rules and relationships from data and combines this with specialized knowledge to derive insights useful for problem solving. The main data-driven research outcomes related to the appropriate use of pharmaceuticals and risk minimization are described below.

2. PREDICTING THE TOXICITY OF CHEMICAL SUBSTANCES USING DATABASE

2.1. Prediction Model for Chemical-Induced Hepatocellular Hypertrophy

Chemical-induced hepatocellular hypertrophy is a common finding in many animal studies and is often used to calculate the No-Observed-Adverse-Effect Level (NOAEL) in toxicity tests. However, its mechanism is complex and the wide variety of chemical structures that induce liver hypertrophy makes it difficult to predict. Therefore, as a method for predicting hepatocellular hypertrophy for a variety of chemicals without fully understanding the underlying mechanism, we focused on chemical structures and developed a machine learning model based on (Q)SAR.8) We employed data concerning pesticides, food additives, and veterinary drugs extracted from assessment reports published by the Japanese Food Safety Commission (http://www.fsc.go.jp/english/index.html, accessed January 2015) and freely available industrial chemical data (Hazard Assessment Support System Integrated Platform: Hazard Evaluation Support System Integrated Platform (HESS) dataset) obtained from the National Institute of Technology and Evaluation (http://www.nite.go.jp/en/chem/qsar/hess-e.html, accessed January 2016) to collect data on rats exposed for 28 d or more. Chemical substances that exhibited hepatocellular hypertrophy, regardless of exposure duration, were defined as “hepatocellular hypertrophy positive.” Chemical substances that did not demonstrate hepatocellular hypertrophy, even after 90 d or more of exposure, were defined as “hepatocellular hypertrophy negative.” The pesticide dataset comprised 346 data points related to hepatocellular hypertrophy, including 122 positive and 224 negative chemical substances. The HESS dataset consists of 503 data points, consisting of 339 positive chemical substances and 164 negative chemical substances. The extracted chemical structure information was used to calculate the molecular descriptors using DRAGON6 (Talete Srl., Milano, Italy). Approximately 500 molecular descriptors were used as variables in the prediction model. The chemical substances in the dataset were randomly divided into training and test datasets at a 1 : 1 ratio. Prediction models were constructed using deep learning, random forest, and support vector machines (SVM). Model performance was evaluated using the area under the receiver operating characteristic curve (ROC-AUC) and prediction accuracy. The best-performing model was a support vector machine model that used only the HESS dataset. It predicted 214 test chemicals within its applicability domain (AD), achieving a prediction accuracy of 0.76, sensitivity of 0.90, and an ROC-AUC of 0.81. Prediction models require reliable prediction results, in addition to predictive performance. Defining AD clarifies the applicability of the model and improves the reliability of prediction results. The Organization for Economic Cooperation and Development (OECD) (Q)SAR validation principles require a clearly defined domain of applicability for (Q)SAR predictive models.12) In this model, we employed Euclidean distances to define the chemical space covered by the predictive model.13) This model enabled the prediction of hepatocellular hypertrophy based on chemical structure information alone, potentially facilitating the development of in silico models for toxicity prediction.

2.2. Comparison of Phthalate Ester Toxicity

Phthalate esters with various side chains are widely used as plasticizers but are known to induce liver cancer and affect reproductive function. Therefore, we focused on the effects of phthalate esters on the liver and reproductive system, collected and organized toxicity studies from the literature, and compared the toxicological characteristics of each phthalate ester.14) Toxicity data were collected for six phthalates: n-butylbenzyl phthalate (BBP), di-n-butyl phthalate (DBP), di(2-ethylhexyl) phthalate (DEHP), di-isodecyl phthalate (DIDP), di-isononyl phthalate (DINP), and di-n-octyl phthalate (DNOP). We also compiled toxicity study information (e.g., animal species, strain, sex, and findings on which the NOAEL/Lowest-Observed-Adverse-Effect Level (LOAEL) was established from reproductive and developmental toxicity studies and repeated-dose toxicity studies. Next, we extracted the lowest NOAEL/LOAEL from studies that showed liver and reproductive system toxicity and compared them for each phthalate ester. In reproductive and developmental toxicity tests, when toxicity findings in the reproductive system of rats were used as an indicator, DBP and DEHP showed the lowest NOAEL/LOAEL; however, when toxicity findings in the rat liver were used as an indicator, NOAEL/LOAEL did not show any significant differences between phthalates. Furthermore, in the results of repeated-dose toxicity tests in rats, when liver toxicity was used as an indicator, the NOAEL/LOAEL did not exhibit any significant differences between the phthalates. The effects of phthalate esters on the reproductive system were confirmed to be related to side chain length, with short-chain phthalates being more toxic than long-chain phthalates. However, no tendency related to side chains was observed in the effects on the liver. These results suggest that the relationship between the side chains of phthalate esters and toxic effects tends to differ between the liver and reproductive system.

2.3. Prediction of Inhibitory Activity of Chemical Substances against CYPs

Reactivity with CYP is involved in the manifestation of toxicity, such as liver damage caused by chemicals. In this study, we measured the reactivity of rat and human CYPs using in vitro inhibitory activity as an index and developed an in silico method to predict CYP inhibitory activity using machine learning based on the measured data.15) Model construction: chemical substances were included from the HESS database (https://www.nite.go.jp/en/chem/qsar/hess-e.html; accessed August 2023). To evaluate the generalizability of the model, an external validation dataset randomly selected from Registration, Evaluation Authorization, and Restriction of Chemicals Regulation (REACH) substances was constructed (https://echa.europa.eu/information-on-chemicals/registered-substances, accessed August 2023). We used 326 substances for model construction and internal validation data. In addition to the 326 substances, 60 substances were used as an external validation dataset. We focused on seven rat P450s (CYP1A1, CYP1A2, CYP2B1, CYP2C6, CYP2D1, CYP2E1, and CYP3A2) and 11 human P450s (CYP1A1, CYP1A2, CYP1B1, CYP2A6, CYP2B6, CYP2C8, CYP2C9, CYP2C19, CYP2D6, CYP2E1, and CYP3A4). The CYP inhibitory activity of each test substance was measured in vitro using recombinant enzymes and a luminescent substrate (P450-Glo system).16,17) Reactivity was determined when the maximum inhibitory activity in the in vitro system was 15% or higher, and non-reactivity when it was < 15%. Using molecular descriptors calculated by mordred (mordred 2019.01)18) as chemical structure information for machine learning, a binary classification model was constructed using XGBoost to determine whether a substance is reactive. Furthermore, the AD was defined, and the range of chemicals for which reliable predictive results could be obtained is shown. Using the actual values measured in vitro, a machine learning model was used to create a discriminatory model for the inhibitory activity of chemicals against each CYP. As a result, most models within the applicable range showed an ROC-AUC value of 0.8 or higher in internal validation and 0.7 or higher in external validation. Models that predict the inhibitory activity of various P450s in both rats and humans using chemical structure information make it possible to examine the similarities and differences in chemical-induced toxicity between species from structural information alone, without the need for experiments.

3. DEVELOPMENT OF NEW SAFETY ASSESSMENT METHODS THAT DO NOT RELY ON ANIMAL TESTING

Allergic contact dermatitis is a type IV delayed allergic reaction mediated by immune cells such as T lymphocytes. Repeated exposure to chemicals can cause skin conditions such as redness and rashes, significantly reducing the QOL. Skin sensitizers that cause skin conditions through repeated exposure are occasionally used in cosmetics and quasi-drugs. Therefore, a safety assessment of the chemical substances contained in these products is crucial to ensuring consumer safety. However, animal safety assessment methods are being reconsidered, particularly in the cosmetics industry, and NAMs are being developed for regulatory assessments that do not rely on animal testing. We focused on the local lymph node assay (LLNA), an internationally recommended method for assessing skin sensitization potential, and developed an in silico alternative. Using a publicly available reliable database and machine learning, we constructed a regression model to quantitatively predict the estimated concentration needed to produce a stimulation index of 3 (EC3) obtained from LLNA, an index of skin sensitization potency.19,20) Using the Cosmetics Europe database,21) we used the LLNA EC3 value as the objective variable and information such as physical properties, chemical structure information, and in chemico/in vitro experimental values (DPRA, KeratinoSens™, h-CLAT, U-SENS™, and SENS-IS) related to the adverse outcome pathway (AOP) for skin sensitization. The machine learning algorithm used was CatBoost, which is a gradient boosting tree system. Gradient boosting decision tree algorithms such as XGBoost and LightGBM generally handle missing values as variables and exhibit excellent predictive performance and interpretability. In addition to these features, CatBoost excels in handling categorical variables and is less susceptible to overfitting, resulting in a high generalization performance even with small datasets.22,23) Data for 120 substances were used to build the model, with 30 substances randomly selected as validation data and the remaining data used as training data for model training. The root mean square error (RMSE) and coefficient of determination (R2) were used as evaluation indices. The predictive performance of the regression model for LLNA EC3 values was evaluated, with an RMSE value of 0.49 and an R2 value of 0.74. Furthermore, when the importance of the explanatory variables was calculated, the in vitro experimental values were ranked highly. The in vitro experimental values used in this model represent indicators of the key events in the AOP of skin sensitization. Furthermore, descriptors related to h-CLAT, which evaluates dendritic cell activation, were highly important and significantly contributed to the prediction. The EC3 value calculated by LLNA is a key strength index for skin sensitization assessment and is the standard for the United Nations Globally Harmonized System of Classification and Labelling of Chemicals, which classifies chemical substances as hazardous. It is also used to calculate the No Expected Sensitization Induction Level (NESIL), an index used in human risk assessment. Quantitative prediction of skin sensitization intensity indices using machine learning is extremely important in developing alternative methods to animal testing for skin sensitization assessment and is expected to lead to case study research involving Next Generation Risk Assessment using NAMs.2426)

4. DEVELOPMENT OF A MODEL TO PREDICT ADVERSE DRUG REACTIONS

4.1. In Silico Prediction Study of Severe Cutaneous Adverse Reactions Using a Large-Scale Adverse Drug Reaction Database

Globally, regulatory authorities in each country are building large-scale ADR databases by collecting reports from healthcare professionals and pharmaceutical companies regarding ADRs that occur in clinical settings. The Pharmaceuticals and Medical Devices Agency (PMDA) has published the Japanese Adverse Drug Event Report database (JADER). In post-marketing pharmacovigilance, spontaneously reported ADR databases are used to detect signals that provide information concerning possible associations between a drug and an adverse event, and this information is used to implement early safety measures. However, when researching adverse reactions using adverse reaction report databases, it is important to note that it is not possible to determine the frequency of adverse reactions and the results must be interpreted carefully, taking into account reporting biases, such as underreporting and the influence of safety information. In this study, we combined the large-scale ADR database JADER with machine learning to develop an in silico predictive model that identifies the risk of severe drug-induced rash based on the chemical structure information of drugs.9) Severe drug-induced rashes, such as Stevens–Johnson syndrome and toxic epidermal necrolysis (SJS/TEN), are rare but can occur after a drug is marketed. Therefore, utilizing a post-marketing ADR database is considered useful. We used adverse event report data registered with JADER from the first quarter of 2004 to the second quarter of 2017 (https://www.pmda.go.jp/safety/info-services/drugs/adr-info/suspected-adr/0003.html, accessed on February 6, 2018). The target drug reactions were severe cutaneous adverse reactions (SCARs) according to the standardized Medical Dictionary for Regulatory Activities compiled by the International Medical Terminology. SCAR-positive and -negative reactions were defined using the Proportional Reporting Ratio (PRR) method and number of reports as signal detection methods, respectively, and the corresponding drugs were extracted. A total of 185 SCAR-positive and 195 SCAR-negative drugs were identified. Molecular descriptors were calculated from the chemical structure information of the drugs using Dragon 7 (Talete Srl.). SCAR-positive and -negative drugs were randomly divided 1 : 1 into training and test sets, respectively. Using calculated molecular descriptors as variables, an adverse reaction prediction model was constructed using deep learning to determine the risk of SCAR events. Performance of the prediction model was evaluated using the ROC-AUC for the predictive results for drugs within the AD of the test set. The prediction model, which used only the chemical structure information as a variable, predicted the risk of SCAR development with an ROC-AUC of 0.76. In this study, we utilized JADER and constructed a SCAR prediction model using machine learning. This method is expected to be useful for efficient screening in drug development and as a supporting method for identifying drugs that might cause adverse reactions in clinical settings.

4.2. Machine Learning Model to Support Post-market Safety Measures

In post-market safety surveillance of pharmaceuticals, revisions to package inserts play a crucial role as safety measures for medical professionals. The PMDA and pharmaceutical companies collaborate to conduct any necessary revisions, considering post-marketing ADR data and other information. However, this process requires considerable resources and time. Furthermore, statistical signal detection techniques such as the PRR and Reporting Odds Ratio (ROR) are used as disproportionality analysis methods to detect the risk signals of unknown ADRs. However, these techniques have limitations, including a high incidence of false positives. Therefore, we focused on adding “serious ADR” to the package inserts and developed a predictive model capable of early detection of revisions. We first focused on drug–ADR pairs based on the accumulation of ADR cases in Japan, which is the most frequently cited reason for the addition of “serious ADR” to package inserts. The correct data were drug–ADR pairs added to the “serious ADR” section due to the accumulation of domestic ADR cases between August 2011 and March 2020. The incorrect data were drug–ADR pairs for which the serious ADR studied in this study were not listed in the package insert as of March 2020, but for which at least one such ADR had been reported. The JADER dataset used to build the prediction model included 34 features, including the cumulative number of ADR cases, cumulative deaths, and cumulative relapses after re-administration, six months before the time of addition. Using an SVM with a radial basis function kernel, the model demonstrated a Matthews correlation coefficient (MCC), an index of binary classification models for imbalanced data, of 0.92 for the test data.10) Next, we focused on reasons for adding the “Serious ADR” column, such as ADR information from overseas and revisions to the Company Core Data Sheet, the standard document for creating package inserts in each country. In addition to JADER, we used the FDA Adverse Event Reporting System (FAERS), a report database covering ADR cases worldwide, as a new dataset for feature creation. The generalized linear model achieved an average MCC of 0.87 in cross-validation, which is an extremely high level of predictive performance.11) We believe that using these two machine learning models in a complementary manner can contribute to supporting decision-making in determining whether safety measures are necessary.

4.3. Prediction Model for Acute Kidney Injury Using Electronic Medical Records Information

Cisplatin has been reported to cause acute kidney injury (AKI) at a relatively high frequency, and early prediction of patients at high risk of AKI and early therapeutic intervention are necessary. Therefore, we developed a model for the early prediction of cisplatin-induced AKI using electronic medical record information, including patient background, laboratory test values, and machine learning.27,28) The study included hospitalized patients aged ≥ 18 years who received their first course of cisplatin chemotherapy from January 1, 2011, to December 31, 2020, at Nagoya City University Hospital. The study protocol and waiver of consent were approved by the Nagoya City University Institutional Review Board (Approval No. 60-21-0053), and this study was conducted in accordance with the Declaration of Helsinki. Patients who met the serum creatinine criteria of the Kidney Disease Improving Global Outcomes guidelines29) used to diagnose AKI within 14 d of the final cisplatin administration in the first course were considered positive. Patients who received cisplatin but did not develop AKI were considered negative. A binary classification model was constructed to distinguish between positive and negative patients. Twenty-nine feature values including test values, concomitant medications, medical history, and cisplatin administration information were used as explanatory variables. A model for predicting AKI onset was constructed using CatBoost, a decision-tree algorithm that uses gradient boosting. Of the patients included in the observation period, 1253 (119 positive, 1134 negative) were used to construct the model. The sensitivity and specificity of the constructed predictive model were 0.81 and 0.66 for the training data, respectively, and 0.88 and 0.57 for the test data, demonstrating its high sensitivity. Furthermore, an ROC-AUC of the test data was 0.78, demonstrating its superior ability to predict the onset of AKI. Furthermore, using SHapley Additive exPlanations (SHAP), a useful method for clinically interpreting the importance of explanatory variables by calculating and visualizing their contribution and direction of contribution,30) we demonstrated that variables such as the concomitant use of magnesium preparations and total cisplatin dose contributed significantly to the prediction of cisplatin-induced AKI. This confirmation of the direction of contribution revealed that the concomitant use of magnesium preparations suppressed the onset of cisplatin-induced AKI. This model is expected to become a non-invasive method for efficiently supporting early therapeutic interventions for cisplatin-induced AKI in clinical settings.

5. CONCLUSION

Through the above research activities, we established a new approach to chemical safety research by combining the foundations of pharmaceutical research (organic chemistry, toxicology, and drug informatics) with the latest machine learning technology and developing it into regulatory science research. Risk assessments using structural information of chemical substances have contributed to the development of new assessment methods that do not rely on animal testing. In addition, we developed a model to quickly predict package insert revisions, which are central to post-marketing safety measures, and worked on building an integrated adverse reaction predictive model that incorporates patient background, drug treatment history, and laboratory test results extracted from electronic medical records. As the medical and environmental fields require greater utilization of the vast amount of information that is updated daily, I will continue to develop data-driven approaches that utilize large-scale medical information and machine learning to avoid toxicity and adverse reactions, contributing to the advancement of pharmaceutical and data science research, which is important for improving human health.

Acknowledgments

I would like to express my deep gratitude to Prof. Masahiro Tohkin (Department of Regulatory Science, Graduate School of Pharmaceutical Sciences, Nagoya City University) for support and encouragement; Dr. Takao Ashikaga, Dr. Takashi Yamada, Dr. Midori Yoshida, and Dr. Kaoru Inoue (National Institute of Health Sciences); Prof. Kouichi Yoshinari and Dr. Takamitsu Sasaki (Laboratory of Molecular Toxicology, School of Pharmaceutical Sciences, University of Shizuoka); Prof. Takayuki Hamano and Dr. Miho Murashima (Department of Nephrology, Nagoya City University Graduate School of Medical Sciences); Prof. Kazunori Kimura and Prof. Yoko Hibi (Nagoya City University Graduate School of Medical Sciences); Dr. Chiharu Wachino and Prof. Masahiro Kondo (Nagoya City University East Medical Center) for technical assistance with the experiments and discussion; Dr. Takashi Watanabe (Department of Regulatory Science, Graduate School of Pharmaceutical Sciences, Nagoya City University) for collaboration. I am deeply grateful to my colleagues and students in the Department of Regulatory Science, Graduate School of Pharmaceutical Sciences, Nagoya City University. This work was supported by a Grant from the Research Foundation for Pharmaceutical Sciences, the Japan Society for the Promotion of Science (JSPS) KAKENHI program (Grant Numbers: JP18K14987, JP20K1605, and JP23K06133), a Grant from the Food Safety Commission, Cabinet Office, Government of Japan (Research program for Risk Assessment Study on Food Safety 1303, 1602, 1801, and 2301), Grant-in-Aid for Research in Nagoya City University (Grant Numbers: 2121103 and 1922008), Health and Labor Sciences Research Grants from the Ministry of Health, Labour and Welfare, Japan (Grant Numbers: 21KD2005, 24KD2002, and 24KD2004), the AI-based substance hazard integrated prediction system (AI-SHIPS) project of the Ministry of Economy, Trade, and Industry of Japan, the Japanese Society for Alternatives to Animal Experiments Foundation NGRA program, and a Grant-in-Aid for Outstanding Research Group Support Program at Nagoya City University (Grant Number: 2530002).

DECLARATION

Conflict of Interest

The author declares no conflict of interest.

Notes

This review of the author’s work was written by the author upon receiving the 2025 Pharmaceutical Society of Japan Incentive Award for Women Scientists.

REFERENCES
 
© 2026 The Author(s).
Published by The Pharmaceutical Society of Japan.

This article is licensed under a Creative Commons [Attribution-NonCommercial 4.0 International] license.
https://creativecommons.org/licenses/by-nc/4.0/
feedback
Top