Streamlining Considerations for Safety Measures: A Predictive Model for Addition of Clinically Significant Adverse Reactions to Japanese Drug Package Inserts

Takashi Watanabe; Kaori Ambe; Masahiro Tohkin

doi:10.1248/bpb.b23-00846

Abstract

The addition of clinically significant adverse reactions (CSARs) to Japanese package inserts (PIs) is an important safety measure that can be used to inform medical personnel of potential health risks; however, determining the necessity of their addition can be lengthy and complex. Therefore, we aimed to construct a machine learning-based model that can predict the addition of CSARs at an early stage due to the accumulation of both Japanese and overseas adverse drug reaction (ADR) cases. The target comprised CSARs added to PIs from August 2011 to March 2022. The control group consisted of drugs without the same CSARs in their PIs by March 2022. Features were generated using ADR case accumulation data obtained from the Japanese Adverse Drug Event Report and the U.S. Food and Drug Administration Adverse Event Reporting System databases. The model was constructed using DataRobot, and its performance evaluated using the Matthews correlation coefficient. The target for the addition of CSARs included 414 cases, comprising 302 due to domestic case accumulation, 22 due to both domestic and overseas case accumulation, 12 due to overseas case accumulation, and 78 due to revisions of the company core data sheet. The best model was a generalized linear model with informative features, achieving a cross-validation of 0.8754 and a holdout of 0.8995. In conclusion, the proposed model effectively predicted CSAR additions to PIs resulting from the accumulation of ADR cases using data from both Japan and the United States.

INTRODUCTION

Under pharmaceutical regulations, the Pharmaceuticals and Medical Devices Agency (PMDA) gathers various safety-related information on post-marketing in a centralized manner in Japan, including adverse effects and infectious disease information reported by manufacturers, medical institutions, and patients, research reports in medicine and pharmacy, and safety measure information from overseas regulatory authorities. The PMDA receives consultations from pharmaceutical companies regarding package insert (PI) revisions for several medications available in Japan. Moreover, it evaluates the collected post-marketing information and discusses the necessity of safety measures. The Ministry of Health, Labour, and Welfare issues notifications related to safety measures, such as PI revisions in Japan, based on the results of these discussions.^1–3)

The PI is a public document in Japan that is stipulated by law regarding the assurance of the quality, efficacy, and safety of pharmaceutical products, and its revision process is a critical safety measure. Most of the revisions made are related to “clinically significant adverse reactions (CSARs),” and according to previous research, 81% of the instructions for the revision of precautions in PIs were related to CSAR addition.⁴⁾ Of these, 63% cited the accumulation of domestic adverse drug reaction (ADR) case information as the reason for CSAR addition. In comparison, 37% cited the revision of the company core data sheet (CCDS), accumulation of overseas ADR case information, similar drug information, revisions of PIs in Europe and the United States, and information from clinical trials, academic societies, guidelines, and literature. Adding CSARs is a crucial safety measure to alert medical personnel to new significant risks; however, determining its necessity is a lengthy and intricate process.^2,3) In this process, disproportionality analysis is used to efficiently detect risk candidates, and many methods have been researched.^5–10) Various regulatory authorities and marketing authorization holders (MAHs) are currently employing these methods, thus contributing significantly to the efficiency of safety measure considerations. However, disproportionality analysis, which seeks to comprehensively identify drug–ADR risk candidates, contains many false positives and requires a significant burden of validation and evaluation after signal detection. In addition, numerous expert reviews are required when considering the need for safety measures, which is costly.

In our previous study, we constructed a machine learning (ML) model to directly predict the addition of CSARs to Japanese PIs because of the accumulation of Japanese ADR case reports, with the aim of streamlining the workflow to consider the need for safety measures.¹¹⁾ The predictive model constructed in the previous study showed an extremely high predictive performance, exceeding 0.9 on the Matthews correlation coefficient (MCC), and important features to predict CSAR additions due to the accumulation of domestic ADR case reports. Although this model demonstrated high predictive performance, it had limitations. It could only predict the addition of CSARs based on the accumulation of domestic ADR cases and could not adapt to situations where CSARs are added to the PI because of the accumulation of overseas ADR case reports. There are various reasons why CSARs are added to PIs. It is expected that by constructing a predictive model that can handle various patterns, its practicality would be increased and could contribute to the efficiency of considering the need for safety measures. Therefore, this study aimed to construct a practical ML model capable of predicting the early addition of CSARs based on the accumulation of various ADR cases, not limited to only domestic ones, with the goal of further streamlining the workflow to consider the necessity of safety measures.

MATERIALS AND METHODS

Data Source and Study Population

In Japan, PIs are revised because of accumulated domestic and overseas ADR cases, revision of the CCDS, literature information, epidemiological data, and requests from academic societies, among others. According to a previous study, literature, epidemiological data, and requests from academic societies accounted for 2.1% (13/618) of the reasons for adding CSARs to PIs.¹¹⁾ It is expected that these would likely have a minimal impact on the predictive model. Therefore, this study focused on the instructions for the revision of precautions for use in drug PIs posted on the PMDA website (https://www.pmda.go.jp/english/safety/info-services/drugs/revision-of-precautions/0001.html), specifically drug–ADR pairs, based on the accumulation of ADR cases, including revision of the CCDS, which is revised mainly based on overseas ADR case accumulation, in which new ADRs were added to the CSAR section.¹²⁾ The PMDA website compiles the instructions for the revision of precautions for use by fiscal year, and the reason for each revision is documented in the “investigation results and background of the revision” section of the “summary of investigation results” report in PDF format. In this study, we obtained reports summarizing investigation results from the PMDA website for the analysis period and extracted the targets for prediction. The analysis period was between August 2011 and March 2022. This time frame was chosen because prior to August 2011, there were no “summary of investigation results” documents that included the reason for the revisions in the instructions for the revision of precautions for use, making it impossible to identify the reasons for the revisions. For feature creation, we used the information contained in the Japanese Adverse Drug Event Report (JADER) database, which aggregates ADR information in Japan, and the U.S. Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS), which compiles ADR information in the United States. FAERS data which we used in this study was curated by deleting duplicate reports and proofing drug names by the Japan Pharmaceutical Information Center. JADER is a database provided by the PMDA and is composed of four tables: DEMO (sex, age, weight, and other patient features), DRUG (drug features, including name and other properties), REAC (types of ADRs and their outcomes), and HIST (medical history). In this study, features were created using three tables (the HIST table was excluded), including information up to March 2022. The DRUG table includes concomitant drugs and interactions in addition to suspected drugs; however, this study only examined suspected drugs. Moreover, the table includes OTC and prescription drugs; however, only prescription drugs were targeted. This study targeted suspected drugs only to focus solely on the suspected drugs themselves, excluding concomitant drugs and interactions, when considering CSAR addition in order to accurately evaluate the need for safety measures. This study aimed to predict the addition of CSAR to PI, and there was a concern that including risks associated with concomitant drugs and interactions, which are considered in different sections from the CSAR in the PI, could introduce bias into the results. Therefore, we limited the scope to suspected drugs only. The REAC table contains the date of occurrence of ADRs; however, some dates were recorded as values before the initial administration date of the drug. These data were preemptively excluded from the dataset. FAERS is a spontaneous reporting system for adverse events and operated by the FDA. It comprises seven tables: DEMO (age, sex, and country of occurrence and onset date of ADR), DRUG (drug features, including name, route, and other properties), REAC (ADR name), OUTC (information on case outcomes), RPSR (sources of reported events), INDI (drug indications), and THER (start and end dates of therapy). This study generated features using the DEMO, DRUG, REAC, and THER tables, which contain data throughout March 2022. As the data collection start time differed from that reported by JADER, data from 2004 onward were used to align with the data collection period of JADER. The FAERS data included case reports from Japan; therefore, Japanese case reports were preemptively excluded. Moreover, similar to the criteria used for JADER, the study was limited to suspected prescription drugs (primary and secondary suspected prescription drugs), and any data in which the initial administration of the drug occurred prior to the onset date of the ADR were preemptively excluded from the dataset. The flowchart of the data extraction process is shown in Fig. 1.

Fig. 1. Data Extraction Scheme for Positive and Negative Cases

The JADER and FAERS databases extracted 414 positive and 94391 negative drug–ADR pairs. JADER, Japanese Adverse Drug Event Report; FAERS, U.S. Food and Drug Administration Adverse Event Reporting System; ADR, adverse drug reaction; CSAR, clinically significant adverse reaction; PI, package insert; PT, preferred term; MedDRA, medical dictionary for regulatory activities.

Target

The prediction targets used in this study included the drug–ADR pairs that were added to the CSAR section of PIs owing to the accumulation of domestic, overseas, and both domestic and overseas cases, or revisions of the CCDS. Those based on the interim results of clinical trials or information from the literature other than case accumulation were excluded from the prediction targets. Negative samples in the prediction were selected as drug–ADR pairs that met the following criteria:

1. The ADR names included in the prediction target for this study are not listed in the CSAR section of PIs as of March 2022.
2. Either JADER or FAERS contains at least one report of the prediction target ADR.

The prediction targets were defined by the preferred terms (PTs) from version 25.0 of the Medical Dictionary for Regulatory Activities (MedDRA) that precisely matched the CSAR names. The CSARs included in the prediction targets are listed in Supplementary Table S1.

Feature Creation

With the objective of early detection of additional CSARs, 1701 features were created in this study from information two, three, and four quarters before the addition. These features, generated using JADER and FAERS databases, contained attributes derived from JADER, FAERS, and both databases. The features included those deemed effective in a prior study, such as the “value of disproportionality analysis,” “the average number of patients reported per quarter,” and “index B: a percentage of ADR cases attributed to the suspected drug out of the total ADR cases attributed to all drugs.” Additionally, base features built on information that strengthen signals written in the Council for International Organizations of Medical Sciences Working Group VIII, the Guideline on Good Pharmacovigilance Practices Module VIII, and previous studies were generated. Moreover, information suggesting a causal relationship between a suspected drug and ADR, and information focused on pharmacovigilance activities at MAHs, were created, including the number of ADR reports, deaths, re-administration recurrence of patients, and days from administration to event onset.^5,6,13,14) In addition, subtraction and multiplication were performed between the same features at different time points as interaction features. The disproportionality signal and relative value compared to those of the other drug–ADR pairs were computed as follows:

Using a 2 × 2 table, we defined a, b, c, and d as follows:

a: the number of ADR cases that occurred after using the suspected drugs.
b: the number of ADR cases that occurred after using all other drugs.
c: the number of all other ADR cases that occurred after using the suspected drugs.
d: the number of all other ADR cases that occurred after using all other drugs.

These features were designed according to the type of reporter and specific subgroups owing to observed differences in the features of the reports between the JADER and FAERS databases.¹⁵⁾ Specifically, a large proportion of the reports in FAERS originates from consumers, whereas most of those in JADER are from healthcare professionals. This variance in report sources is hypothesized to potentially affect the data quality and predictive performance of the models. Unlike FAERS, JADER does not have information on the first or second suspected drugs. In the case of a 1 : 1 relationship between the suspected drug and the ADR, the cause of the ADR was clearer than in the case of a many-to-one relationship. This information is important for determining the causal relationship between a suspected drug and an ADR. Therefore, in addition to the features created using all suspected drugs, features were created for each case with a 1 : 1, 1 : 2, and 2 : 1 relationship between suspected drugs and ADRs. Table 1 lists the features generated based on JADER and FAERS data.

Table 1. Feature List in the Predictive Model

	Derived from JADER	Derived from FAERS	Derived from both JADER and FAERS
Basic features	Number of reporters, number of deaths, number of recurrence patients of re-administration, number of discontinued patients, number of days from administration to event onset (average, median), number of patients with event onset within N days from administration (N = 15, 30, 90), average number of missing values per report, average number of reports per case, number of quarters since the first AE was reported, average number of reports per quarter These are all segmented by reporter type and within the groups where the ADR to suspected drug relationship is limited to 1 : 1, 1 : 2, or 2 : 1.	Number of reporters, number of reporters from physicians, number of reporters from pharmacists, number of reporters from other healthcare professionals, number of reporters from consumers, total number of reporters from healthcare professionals, number of reporters from the United States, number of reporters from the EU’s 5 major countries, number of primary suspect drugs, number of quarters since the first AE was reported, average number of reports per quarter	Number of reporters from Japan and the United States, number of reporters from Japan and the EU’s 5 major countries, difference in the number of quarters since the first report in Japan and the first report overseas, ratio of reporters from Japan to the United States, ratio of reporters from Japan to the EU’s 5 major countries
Disproportionality analysis	PRR, chi-square test, signals determined by PRR These features were calculated for each report count according to the conditions computed in Basic features.	PRR, chi-square test, signals determined by PRR These features were calculated for each report count according to the conditions computed in the Basic features.	Signal values based on the number of reports from JADER and FAERS (PRR, chi-square test), signal values based on the number of reports from JADER and the EU’s 5 major countries in FAERS (PRR, chi-squared test), signals determined by PRR
Relative features	Index A, Index B
Relative features	These features were computed for each report count based on the conditions computed in Basic features.
Lag features	Using the point two quarters prior to the addition of CSAR to the PI as a reference generates the basic features, disproportionality analysis, relative features, and interaction features for one and two quarters prior to that reference point.
Interaction features	At various time points, subtraction and multiplication were performed between the same features.

Index A, percentage of a specific type of ADR cases due to the suspected drug out of all ADR cases that occurred after using the same drug; Index B, percentage of ADR cases attributed to the suspected drug out of the total ADR cases from all drugs; PRR, proportional reporting ratio; chi-squared test, Pearson’s chi-squared test with Yates continuity correction; CSAR, clinically significant adverse reaction; PI, package insert; JADER, Japanese Adverse Drug Event Report; FAERS, the U.S. Food and Drug Administration Adverse Event Reporting System.

Model Development and Performance Evaluation

Linear and nonlinear models were developed using the automated ML platform, DataRobot. More than 3000 sets of procedures for data processing, feature engineering, and ML algorithms, including support vector machines, elastic net classifiers, regularized logistic regression, light gradient boosting machines, extreme gradient boosting (XGBoost), neural network classifiers, and other ML models, are developed from its repository. This platform also provides a method for creating a more robust and accurate ensemble model comprising an elastic net, average, median, and generalized linear model (GLM) blender by combining the predictions of two or more models. The software automatically selects and executes a suitable set of procedures to investigate patterns in the data.

All developed models were validated using fivefold stratified cross-validation and holdout. Before developing the models, we randomly selected 20% of the dataset as the holdout, which was never used for training or validation. The remaining data were divided into five mutually exclusive data folds, four of which were used for training, and the final fold used for validation. The models were trained five times per algorithm, and each fold used once for validation. To avoid overfitting, cross-validation scores were calculated using the mean logarithmic loss of the five possible validation folds. Finally, the models were validated on holdout to demonstrate the generalizability of their performance to new data. The predictors are rare and contained imbalanced data, with an extremely high number of negative cases relative to positive ones. Therefore, the effect of the holdout partitioning method on predictive performance of the model was evaluated with five seed randomizations.

The models were developed with two feature sets: one feature set containing 1511 features, excluding those that DataRobot automatically judged to be too few in value to be informative (informative features), and the other feature set consisting of 100 features, which were the most important in the model with the best performance among those with informative features (top 100 features). Among the results modelled with both feature sets, the model with the highest evaluation metric was adopted as the final predictive model. The MCC was used as an evaluation metric to address data imbalances.

Relative importance of the features in the models was assessed using the importance of permutation, as described by Breiman.¹⁶⁾ This method is widely used in ML because it can be applied to both linear and nonlinear models. To calculate the permutation importance of a feature in a model, its values in the validation data were randomly shuffled (reordered), whereas all other features remained unchanged. If a feature significantly affects the target, the performance score of the evaluation metric should decrease significantly. We calculated the importance of the permutations of all features and divided them by the maximum ratio of the resulting performance scores to the original scores to normalize and compare the different models. Analysis and modeling were performed using Python 3.9.3 and DataRobot 3.2.0.

RESULTS

Target Characteristics

During the target period, 702 drug–ADR pairs were added to the PIs as CSARs. Of these, 648 cases (92.3%) involved the addition of CSARs to PIs because of the accumulation of ADR cases, including CCDS revisions. The analysis targets included 414 positive and 94391 negative cases. The breakdown of positive cases was as follows: 302 drug–ADR pairs were due to the accumulation of domestic cases; 22 due to both domestic and overseas cases; 12 due to overseas cases; and 78 due to CCDS revisions. The breakdown of the added CSARs was as follows: interstitial lung disease was the most common (38 cases), followed by abnormal hepatic function (34 cases) and anaphylactic reaction (31 cases). Although interstitial lung disease had 36 cases (94.7%) associated with the accumulation of domestic cases, anaphylactic reaction had 11 cases (35.4%) associated with the accumulation of overseas cases, including CCDS revisions, which was a high percentage among the main CSARs (Fig. 2).

Fig. 2. Percentage Distribution of Revision Reasons for the Top 10 Adverse Drug Reactions in Target Clinically Significant Adverse Reactions (CSARs)

CCDS, company core data sheet.

Comparison of Prediction Models and Feature Importance

Among the ML models and all seed values, the GLM combining the neural network model and XGBoost with informative features consistently showed the highest predictive performance for the addition of CSARs to PIs (mean MCCs were 0.8713 and 0.8677 for cross-validation and holdout, respectively) (Table 2). However, the predictive performance of the modeling at each seed value using the top 100 features was consistently lower than that of the models using informative features, with an average of 0.8096 for cross-validation and 0.7986 for holdout (Supplementary Table S2). In the seed with the highest predictive performance, the holdout targets included 60, 3, and 4 additions due to the accumulation of domestic, both domestic and overseas, and overseas cases, respectively, and 16 additions due to CCDS revisions. Among these, seven cases (11.7%) comprised false negatives for the accumulation of domestic cases, and a total of five cases (21.7%) for the reasons associated with the accumulation of overseas ADR cases. The breakdown was one case (33.3%) for the accumulation of both domestic and overseas cases, three cases (75%) for the accumulation of overseas cases, and one case (6.3%) for CCDS revisions.

Table 2. Prediction Performance of the Best Model at Each Seed

Seed No.	Model	Best feature set	Cross-validation			Holdout
Seed No.	Model	Best feature set	MCC	Sensitivity	Specificity	PPV	NPV	MCC	Sensitivity	Specificity	PPV	NPV
1	GLM blender	Informative features	0.8754	0.8358	0.9997	0.918	0.9993	0.8995	0.8554	0.9998	0.9467	0.9994
2	GLM blender	Informative features	0.8827	0.8358	0.9997	0.9333	0.9993	0.8639	0.8434	0.9995	0.8861	0.9993
3	GLM blender	Informative features	0.8891	0.806	0.9999	0.9818	0.9991	0.8556	0.8193	0.9996	0.8947	0.9992
4	GLM blender	Informative features	0.8546	0.8358	0.9995	0.875	0.9993	0.8402	0.7831	0.9996	0.9028	0.999
5	GLM blender	Informative features	0.8548	0.8806	0.9992	0.831	0.9995	0.8792	0.8193	0.9998	0.9444	0.9992
Mean (S.D.)			0.8713 (0.0159)	0.8388 (0.0267)	0.9996 (0.0003)	0.9078 (0.0575)	0.9993 (0.0001)	0.8677 (0.0227)	0.8241 (0.0277)	0.9997 (0.0001)	0.9149 (0.0286)	0.99922 (0.0001)

S.D., standard deviation; GLM, generalized linear model; MCC, Matthews correlation coefficient; PPV, positive predictive value; NPV, negative predictive value.

We aimed to develop a predictive model that predicts the addition of CSAR to PI using accumulated ADR case data in this study, regardless of whether the cases are domestic or overseas. To determine whether the new model can expand the scope of prediction beyond that of our previous study, we evaluated the ability of the previous model to accurately predict CSAR additions to PI based on cases associated with the accumulation of overseas ADR cases, which includes the accumulation of overseas cases, accumulation of both domestic and overseas ADR cases, and revision of the CCDS, in a holdout dataset. We utilized the same model, hyperparameters, and features that yielded the best results in our previous study.¹¹⁾ As a result, among the twenty-three cases, ten drug-ADR pairs were not recorded in JADER, which means the former model could not predict these ten cases. When predicting the remaining thirteen cases, the former model correctly predicted the addition of CSAR to the PI in only one case (4.3%).

In the best GLM, the top five features determined by the importance of permutation were “Index B derived from JADER + FAERS case reports 2 quarters pre-event,” “PRR from JADER case reports + reports from major 5 EU countries 2 quarters pre-event,” “PRR from JADER case reports + reports from major 5 EU countries 3 quarters pre-event,” “Index B derived from JADER + reports from major 5 EU countries 3 quarters pre-event,” and “Number of quarters elapsed since the first report of the ADR derived from JADER 2 quarters pre-event.” Among them, “Index B derived from JADER + FAERS case reports 2 quarters pre-event” had the highest impact (Fig. 3). The importance of the top five features in each seed was identified and “PRR from JADER case reports + reports from major 5 EU countries 3 quarters pre-event” selected as an important feature in all seeds. “Index B derived from JADER + FAERS case reports 2 quarters pre-event” was also selected as an important feature for four of the five seeds (Supplementary Table S3).

Fig. 3. Importance of Permutation in the Best Generalized Linear Model

JADER, Japanese Adverse Drug Event Report; FAERS, U.S. Food and Drug Administration Adverse Event Reporting System; EU, European Union; PRR, proportional reporting ratio; Index B, ratio of adverse events associated with a specific drug to those associated with all drugs.

DISCUSSION

This study presents the first ML model capable of predicting the addition of CSARs to PIs based on the accumulation of ADR cases, regardless of whether their accumulation was domestic, overseas, or both. The model with the best performance in this study had an MCC of 0.8995, with correct answers for each revision reason being 88.3% (53/60) for domestic cases, 66.7% (2/3) for both domestic and overseas cases, 25% (1/4) for overseas cases, and 93.8% (15/16) for CCDS revisions. Furthermore, considering the results with different seed values, an average MCC of 0.8677 was obtained, indicating that the addition of CSARs due to the accumulation of ADR cases can be predicted with high predictive performance using JADER, FAERS, and ML. 78.3% (18/23) of additions of CSAR to PI associated with the accumulation of overseas ADR cases including the accumulation of overseas ADR cases, the accumulation of both domestic and overseas ADR cases, and revision of the CCDS, were predictable by our new model. This result suggests that even when the accumulation of overseas ADR cases, consisting of three reasons, is related to the addition of CSAR, the prediction model demonstrates a certain level of predictive performance. However, when we look at the predictive performance for each reason for adding CSAR individually, the predictions for additions based on the accumulation of overseas ADR cases were mostly incorrect. It is difficult to conclude that the prediction model is inapplicable for additions of CSAR due to the accumulation of overseas ADR cases, given their infrequency compared to the accumulation of domestic ADR cases and revision of the CCDS, with just four instances in the holdout dataset. Therefore, it is challenging to discuss the predictive performance for the addition of CSAR based on the accumulation of overseas ADR cases solely on the obtained results. However, considering that revision of the CCDS, which showed a high accuracy rate, is an event that occurs based on the accumulation of overseas ADR cases, it is believed that revision of the CCDS and the accumulation of ADR cases should be evaluated together. In that case, the prediction model would be able to predict 80.0% (16/20) of the cases, which suggests that it would be sufficiently practical to use this model for predicting the addition of CSAR to PIs related to the accumulation of overseas ADR cases. In addition, it can be argued from the predictive performance of the model in our previous study that this model is beneficial in predicting the addition of CSAR associated with the accumulation of overseas ADR cases. In our previous study model, it was only able to predict 4.3% (1/23) of the additions of CSAR associated with the accumulation of overseas ADR cases that were included in the holdout dataset of this study. Upon checking the data that the predictions did not match, it was found that 43.5% (10/23) were drug–ADR pairs that were not recorded in JADER. As such, a CSAR may be added to a drug PI in cases where no such ADR reports were made in Japan. Even in cases where such reports are made, the preceding research model has been demonstrated to rarely produce accurate predictions. Hence, it can be inferred that the predictive model developed in this study, which can predict the addition of CSAR due to the accumulation of ADR cases regardless of whether they are domestic or overseas, is beneficial in terms of practicality.

Among the top 10 features that contributed to this high predictive performance, three were derived from JADER and seven from both databases. The balanced inclusion of features from both databases at the top of the important features suggests that information from both databases is important for predicting targets, which indicates that the model attempts to predict CSAR additions regardless of national or international case accumulation. The reason why features derived from either one of the databases did not dominate the top ranks among the important features can also be explained from the perspective of predictive performance for each revision reason. This prediction model has been able to correctly predict CSAR additions based on the accumulation of domestic ADR cases and the cases associated with the accumulation of overseas ADR cases at rates of 88.3 and 78.3%, respectively. Given these high predictive accuracies, it would be natural to expect that features originating from both the JADER and FAERS databases would be selected among the top 10. Furthermore, by looking at the five main important features of the best model for each seed, PRR and Index B were commonly important, showing the same trend as our previous research, specifically when the addition was due to the accumulation of domestic cases.¹¹⁾ Additionally, in a study analyzing the revision of vaccine PIs, a significant disproportionality appeared 16.5 months (median) before PI revision, suggesting that it can be significantly affected by early values of disproportionality analysis.¹⁷⁾ These findings suggest that early results of the disproportionality analysis and relative indicators compared to other drugs or other ADRs are important for predicting the addition of CSARs to PIs due to the accumulation of ADR cases.

The predictive model in our previous study, which constructed an ML model to predict the addition of CSARs to PIs, demonstrated high predictive performance by focusing specifically on additions due to the accumulation of domestic cases.¹¹⁾ Figure 2 shows that the addition of CSARs to PIs is motivated differently depending on the ADR. In other words, the previous study model was particularly adept at predicting whether ADRs, such as interstitial lung disease, abnormal hepatic function, decreased platelet count, and rhabdomyolysis, which mainly consist of accumulated domestic cases, will be added to PIs as CSARs. In addition, we believe that the previous study model is effective for drugs approved in Japan ahead of other countries. In contrast, the model constructed in the current study, which includes FAERS data as a learning target, is capable of predicting the addition of CSARs due to the accumulation of both domestic and overseas cases, as well as CCDS revisions, that could not be predicted by the model of the previous study. This indicates that our model can cover approximately 90% of CSAR additions using a single model. As mentioned before, although reasons as to why CSARs are added to PIs depend on the targeted ADRs, these are not definitive and are difficult to foresee. Therefore, compared to the model constructed in the previous study, which had limited usable conditions, the current model is highly practical, as it can be used without considering the reasons for revision.

We have trained our model to maximize the MCC as an evaluation metric. However, when considering the application to safety measures, it might be more practical to use a model that prioritizes sensitivity, even at the expense of some predictive performance, and tolerates a certain level of false positives. Our model aims to reduce both false positives and false negatives and does not prioritize sensitivity alone, but the average sensitivity on the holdout dataset is 0.8241. This indicates that the model correctly identified approximately 82% of the cases where CSARs were added to PIs. The ability to maintain this high sensitivity while maximizing the MCC, a balanced measure that considers both false positives and false negatives in imbalanced data, suggests that the model meets a certain level of performance required to examine the need for safety measures.

Signals detected through disproportionality analysis are merely candidates for risk.^5,6) Of the 1605 signals detected in 2022, only 39 (2.4%) were reportedly validated by the Pharmacovigilance Risk Assessment Committee of the European Medicines Agency.¹⁸⁾ This suggests that traditional signal detection methods, such as disproportionality analysis, are not sufficient for examining the need for safety measures. Our model enables an efficient examination of the need for safety measures by accurately detecting drug–ADR pairs with a high probability of CSARs being added to PIs. However, this model focuses solely on safety measures with respect to the addition of CSARs to PIs. The need for other safety measures, such as detecting drug interactions, considering patients with specific backgrounds, and adding ADRs already listed as CSARs to the warning section, plays an important role in the review of disproportionality analysis and individual case assessment by experts. In other words, conventional methods of reviewing the necessity of safety measures and this model are complementary; one does not preclude the other.

The target audience for this model includes the PMDA and MAHs. To streamline the workflow of safety measure considerations, we believe that it is essential to differentiate analysis methods and predictive models based on their purpose: 1) apply traditional disproportionality analysis to comprehensively detect risk candidates; 2) apply this model to detect candidates for considering additions to PIs as CSARs on the grounds of ADR case accumulation; and 3) apply the model from the previous study to certain drugs that were approved in Japan before other countries or are sold exclusively in Japan, and to some ADRs for which the PIs are likely to be revised because of the accumulation of domestic cases, based on empirical evidence. By integrating our model into the standard workflow, one could efficiently determine whether safety measures are necessary while conserving time and resources.

This study had some limitations. First, this study used JADER to predict the addition of CSARs due to the accumulation of domestic cases and FAERS to predict that due to the accumulation of overseas cases, including CCDS revisions. However, not all overseas ADR reports are stored as data in FAERS. It would be useful to use a database that accumulates ADR information on a global scale to enhance the predictive ability of CSAR additions due to the accumulation of overseas cases. FAERS was primarily developed based on adverse event reports in the United States, but it also contains many reports from outside the United States, particularly from Europe. Based on the predictive ability of this study, FAERS is believed to contain sufficient information. Second, the model was developed based on the hypothesis that predictions can be made using information available approximately six months before the addition of CSAR information to PIs. However, if there is a high incidence of fatal ADRs immediately after launch, CSAR information can be added to the PIs before the six-month waiting period, which our model would not be able to detect. Third, we have not yet been able to quantitatively assess how much this model can streamline the considerations for safety measures. Additionally, even when using this model, the necessity for expert review will not be eliminated; it is important to note that this model is intended to accurately screen for potential drug–ADR pairs that should be added to the PI as CSAR for expert review. In the future, it is desirable to use this model at the time of evaluating the need for safety measures in actual cases, to verify how much it can quantitatively improve the efficiency of such evaluations.

In conclusion, the addition of CSARs to PIs owing to the accumulation of ADR cases was predicted using JADER, FAERS, and an ML model. Combining this model with the conventional methods used to consider safety measures could potentially provide more efficient support for determining whether safety measures are necessary.

Acknowledgments

We thank the Japan Pharmaceutical Information Center (JAPIC: https://www.japic.or.jp) for curating the FAERS data.

Funding

This study was supported by JSPS KAKENHI Grant Numbers: JP21K06647, JP23K06133.

Conflict of Interest

TW is an employee of Ono Pharmaceutical Co., Ltd. KA and MT have no competing interests to report.

Data Availability

Publicly available datasets (JADER and FAERS) were analyzed herein. These datasets can be found on the PMDA and FDA websites. The authors declare that all data supporting the findings of this study are available in the paper and Supplementary Materials.

Supplementary Materials

This article contains supplementary materials.

REFERENCES

1) Pharmaceuticals and Medical Devices Agency. “Outline of post-marketing safety measures.”: ‹https://www.pmda.go.jp/english/safety/outline/0001.html›, accessed 01 November, 2023.
2) Pharmaceuticals and Medical Devices Agency. “Standard workflow for consideration of safety measures such as revision of electronic drug product package inserts.”: ‹https://www.pmda.go.jp/files/000243072.pdf›, accessed 01 November, 2023.
3) Pharmaceuticals and Medical Devices Agency. “Reference: standard workflow for consideration of safety measures.”: ‹https://www.pmda.go.jp/files/000243073.pdf›, accessed 01 November, 2023.
4) Suzuki Y, Kishi T, Nakamura M, Yamada H. Evaluation of factors influencing addition of clinically significant adverse reactions section in drug package inserts. Jpn J. Drug Inform, 19, 17–23 (2017).
5) CIOMS. “Practical aspects of signal detection in pharmacovigilance. Report of CIOMS working Group VIII.”: ‹https://cioms.ch/working_groups/working-group-viii/›, accessed 01 November, 2023.
6) European Medicines Agency. “Guideline on good pharmacovigilance practices (GVP) Module IX (Rev. 1).”: ‹https://www.ema.europa.eu/en/documents/scientific-guideline/guideline-good-pharmacovigilance-practices-gvp-module-ix-signal-management-rev-1_en.pdf›, accessed 01 November, 2023.
7) Evans SJ, Waller PC, Davis S. Use of proportional reporting ratios (PRRs) for signal generation from spontaneous adverse drug reaction reports. Pharmacoepidemiol. Drug Saf., 10, 483–486 (2001).
8) Hauben M, Madigan D, Gerrits CM, Walsh L, Van Puijenbroek EP. The role of data mining in pharmacovigilance. Expert Opin. Drug Saf., 4, 929–948 (2005).
9) Stephenson WP, Hauben M. Data mining for signals in spontaneous reporting databases: proceed with caution. Pharmacoepidemiol. Drug Saf., 16, 359–365 (2007).
10) Szarfman A, Machado SG, O’Neill RT. Use of screening algorithms and computer systems to efficiently signal higher-than-expected combinations of drugs and events in the US FDA’s spontaneous reports database. Drug Saf., 25, 381–392 (2002).
11) Watanabe T, Ambe K, Tohkin M. Predicting the addition of information regarding clinically significant adverse drug reactions to Japanese drug package inserts using a machine-learning model. Ther. Innov. Regul. Sci., 58, 357–367 (2024).
12) Pharmaceuticals and Medical Devices Agency. “Revisions of PRECAUTIONS.”: ‹https://www.pmda.go.jp/safety/info-services/drugs/calling-attention/revision-of-precautions/0001.html›, accessed 01 November, 2023.
13) Insani WN, Pacurariu AC, Mantel-Teeuwisse AK, Gross-Martirosyan L. Characteristics of drugs safety signals that predict safety related product information update. Pharmacoepidemiol. Drug Saf., 27, 789–796 (2018).
14) Caster O, Juhlin K, Watson S, Norén GN. Improved statistical signal detection in pharmacovigilance by combining multiple strength-of-evidence aspects in vigiRank. Drug Saf., 37, 617–628 (2014).
15) Nomura K, Takahashi K, Hinomura Y, Kawaguchi G, Matsushita Y, Marui H, Anzai T, Hashiguchi M, Mochizuki M. Effect of database profile variation on drug safety assessment: an analysis of spontaneous adverse event reports of Japanese cases. Drug Des. Devel. Ther., 9, 3031–3041 (2015).
16) Breiman L. Random forests. Mach. Learn., 45, 5–32 (2001).
17) Suzuki S, Imai S, Mitsuboshi S, Kizaki H, Hashiguchi M, Hori S. Detection of vaccine adverse events before package insert revisions using a Japanese spontaneous reporting system. J. Clin. Pharmacol., 63, 903–908 (2023).
18) European Medicines Agency. “Annual report on EudraVigilance for the European Parliament, the Council, and the Commission.”; ‹https://www.ema.europa.eu/en/documents/report/2022-annual-report-eudravigilance-european-parliament-council-commission_en.pdf›, accessed 01 November, 2023.

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）