2025 Volume 7 Issue 2 Pages 50-60
BACKGROUND
The post-authorization safety study of a vaccine is an important public health task, and its results contribute to the decisions about whether to recommend a vaccination by estimating not only the risk ratio but also the risk difference. There are few reports of adjusted risk differences. We evaluated the statistical performance of the adjusted risk difference and its variance under a post-authorization safety study’s settings (rare events, large sample, extreme exposure frequency).
METHODS
Adjusted risk differences were estimated using ordinary least squares estimators in a linear regression model with a binary outcome, and their variances were estimated using the standard error from ordinary least squares and four types of robust variance. In a simulation, we evaluated the risk differences’ performances using bias, coverage, and power and using data from the Vaccine Effectiveness, Networking, and Universal Safety study as an example of an actual post-authorization safety study.
RESULTS
The adjusted risk difference using ordinary least squares was not biased. Compared to the ordinary least squares’ standard error, the robust variance achieved more appropriate coverage and higher power. With actual data, including 2 × 2 tables of exposure and outcome with zero, both the ordinary least squares and robust variance could be estimated.
CONCLUSIONS
In post-authorization safety study settings, the estimation of the risk difference using ordinary least squares and robust variance showed better performance than the typical ordinary least squares. These findings may prove beneficial for reporting risk difference in extreme settings such as post-authorization safety studies.
In public health decision-making, it is crucial to present not only relative measures but also absolute measures1,2). The relative measures for binary outcomes are the risk ratio (RR) and odds ratio, and the absolute measure is the risk difference (RD). For example, consider the post-authorization safety study (PASS) of a vaccine. Assuming that the efficacy of a certain vaccine has been confirmed through clinical trials, let’s consider two scenarios: (1) the risk of adverse events (AEs) in the vaccinated group is 0.005% compared to 0.001% in the unvaccinated group, and (2) the AE risk in the vaccinated group is 0.5% versus 0.1% in the unvaccinated group. The RR is five times greater in both scenarios (risk in the vaccinated group/risk in the unvaccinated group), and the RD is 0.004% in the first scenario and 0.4% in the second scenario. Focusing only on the RR, the fivefold increase may seem high to many people, potentially leading to decisions against vaccination. However, considering the RD, the decision may vary between the two scenarios. With a small risk increase in the first scenario, the benefits of the vaccine might be prioritized, whereas in the second scenario, the larger risk increase might lead to a decision against vaccination. This illustrates that the RD is an essential indicator in decision-making for public health.
A PASS is an important feature of public health. However, due to the large size of the target population in a PASS, extremely high costs are involved, and the issue of vaccine hesitancy also necessitates the careful conduct of the research and the presentation of its results3). As is the case for many epidemiological studies and clinical trials, research into vaccine safety often presents only relative measures rather than absolute measures. Indeed, we randomly selected 25 studies from the 145 studies reported by the Vaccine Safety Datalink in the U.S.4), 15 of which were cohort studies; among these, only one study estimated an adjusted RD5).
An adjusted RD can be estimated using the ordinary least squares (OLS) method for a linear regression model and the maximum likelihood estimation (MLE) for a binomial regression model6). The typical OLS method, assuming a normal distribution of errors, allows for the construction of confidence intervals and hypothesis testing, but this assumption of normality deviates in the case of binary outcomes. Cheung proposed a method for estimating the RD using the OLS method with modified variance and described the simulations performed7); the results showed that, even if the assumption of normality for errors deviates, the RD and its confidence intervals can be reasonably estimated. However, the sample size of Cheung’s study was <500, the RD ranged from 0 to 0.3, and the exposure rate was between 0.5 and 0.7. In a PASS, the objective is to detect rare AEs that could not be detected in clinical trials (such as a few people per ten thousand or per hundred thousand), which requires a very large sample size and deals with extremely rare event occurrences and potentially extreme exposure frequency. Henceforth in this paper, we will refer to these extreme settings as “PASS settings.”
Cheung’s settings are greatly different from those in a PASS, and we therefore need to understand how statistical measures related to adjusted RDs behave in PASS settings. In the present study, as in Cheung’s research, we focused only on the estimation of RDs by using an OLS estimator in a linear regression model, which provides analytically stable estimates.
The objective of this study was to evaluate the statistical properties of RDs using the OLS method and modified standard errors under PASS settings. Through the statistical performance, we demonstrate that this method yields point estimates that are unbiased, correctly reflect uncertainty, and can accurately detect signals of AEs. In addition, using data from the Vaccine Effectiveness, Networking, and Universal Safety (VENUS) study which is being developed with the aim of creating a Japanese version of the Vaccine Safety Datalink, we applied the OLS and modified standard errors method to safety assessment studies of the Bacille Calmette-Guerin (BCG) vaccine for children and the 23-valent pneumococcal polysaccharide vaccine for older adults.
We consider a cohort study design with total sample size
(1) |
where
The hat denotes an estimated coefficient. Similarly, for the case of
The difference in these expected values is denoted as
The adjusted RD,
We consider point estimation in OLS. Here, let
In Model 1, the target parameter
The variance from OLS is defined as follows:
where
Even with heteroskedasticity, HC0 is asymptotically a consistent estimator of
We set four major scenarios that could occur in a PASS and evaluated the statistical properties of RD estimation when using OLS and four types of robust variance. All scenarios were based on cohort study designs, assessing the association between vaccination and subsequent AEs through the RD. The simulation settings, following Morris’s guidelines13), are shown in Table 1. The sample sizes were 1,000, 10,000, and 100,000, with the outcome prevalence in the absence of exposure set to 1/sample size, i.e., 0.001, 0.0001, and 0.00001.
Aims | Scenario 1 | Scenario 2 | Scenario 3 | Scenario 4 | |
---|---|---|---|---|---|
Estimate the RD and its SE using OLS or the Huber-White robust SE under possible situations in post-authorization safety studies. | |||||
Data-generating mechanisms | Common | ∙ Sample size: 1,000, 10,000, 100,000 ∙ Prevalence of event: 1/sample size ∙ True RD: Increased number of events due to exposure/sample size |
|||
Settings for each scenario | ∙ Increased number of events due to exposure: 0, 1, 2, 5, 10, 20, 30, 50 ∙ Exposure probability: 0.5 ∙ Adjusted variablea: Sex |
∙ Increased number of events due to exposure: 0, 5, 10, 20, 50 ∙ Exposure probability: 0.7, 0.8, 0.9, 0.95 ∙ Adjusted variablea: Sex |
∙ Increased number of events due to exposure: 0, 5, 10, 20, 50 ∙ Exposure probability: depending adjusted variables ∙ Adjusted variablesa: Sex, age |
∙ Increased number of events due to exposure: 0, 5, 10, 20, 50 ∙ Exposure probability: depending adjusted variables ∙ Adjusted variablesa: Sex, age, number of medical visits |
|
Estimands | RD, SE (95%CI) of risk difference | ||||
Methods | OLS, Huber-White robust SE | ||||
Performance measure | ∙ Point estimate, Bias, Relative bias, Empirical SE, MSE, Coverage ∙ Monte Carlo SE of Bias, Empirical SE, MSE, Coverage |
||||
Reasons for setting up the scenario | Assumes safety evaluation of vaccines for the older adults with a vaccination proportion of about 50%. No confounders are present. | Assumes safety evaluation of vaccines for children with a vaccination proportion of 80% or higher. No confounders are present. | Assumes safety evaluation of vaccines for children with a vaccination proportion of about 80%. Confounders are present. As there is little variation in age and frequency of medical visits among children, we set only one adjustable confounder (sex). | Assumes safety evaluation of vaccines for the elderly with a vaccination proportion of 50% or higher. Confounders are present. As there is variation in age and frequency of medical visits among the older adults, we set multiple adjustable confounders. |
a Causal structure and probability distribution of adjusted variables indicated Supplementary Figure S1. CI: confidence interval, MSE: mean squares error, OLS: ordinary least squares, RD: risk difference, SE: standard error.
We assumed that the number of AE occurrences increases by 0, 1, 2, 5, 10, 20, and 30 due to exposure. The true RD is calculated by subtracting the prevalence from the sum of the prevalence and the increase in event numbers per sample size (for example, if the sample size is 1,000, the prevalence is 0.001, and the increase in event numbers due to exposure is 10, then (0.001 + 10/1,000) – 0.001 = 0.01).
The adjustment factor common to all scenarios in Model 1 is sex, with a ratio of 1:1 between males and females. Based on the settings described, Scenario 1 assumes an exposure probability of 0.5. In Scenario 2, it is assumed that the majority of people are vaccinated, similar to pediatric vaccines, with exposure probabilities set to 0.7, 0.8, 0.9, and 0.95. Scenarios 3 and 4 adopt a matched cohort design in which unvaccinated individuals have a corresponding exposure date (index date) for matched vaccinated individuals. This setup allows for the measurement of confounders such as age at exposure and the number of healthcare visits before exposure, which are included in the model. Specifically, Scenario 3 is based on a pediatric vaccine, and Scenario 4 considers a vaccine for older adults. For detailed causal structures and probability distributions, see Supplementary Fig. S1.
The adjusted RD was taken as the point estimate obtained from OLS using the lm function in the R. The SE of the OLS was estimated from the lm function, and robust variance was estimated using the vcovHC function from R’s sandwich package14). The statistical properties were measured by the expected value of the point estimate, bias, empirical SE, mean squares error (MSE), coverage, and power. Since the focus is particularly on the estimation of the standard error, the coverage and power are crucial performance measures. Coverage is the probability that a confidence interval includes the true parameter value. By determining whether the confidence intervals have appropriate coverage, we can determine whether the estimated risk differences correctly reflect uncertainty. Assessing the power allows us to understand to what extent we can detect associations between AEs and vaccination under any given significance level. The occurrence of AEs and the presence or absence of exposure were cross-tabulated in this study, and cases including 0 cells were counted. Simply fitting a logistic regression model to data with zero cells fails to estimate the odds ratio. Since estimating robust variance is not possible with an event count of 0, we excluded cases with 0 events, and the number of iterations was set to 2,000. The implementation was done using R. The simulation code is available from the following GitHub: https://github.com/ShuntaroS/The-adjusted-risk-difference-estimation-for-sparse-data.
APPLICATION TO VENUS STUDY DATATo evaluate the post-market effectiveness and safety of various vaccines, the authors of the VENUS study constructed a database of administrative information on residents provided by municipalities, along with claims data and immunization registry data15). In this section, we describe how we applied the aforementioned methods to the data from the VENUS study to evaluate the safety of a BCG vaccination for children and a pneumococcal vaccination for older adults. AEs were identified based on the International Statistical Classification of Diseases and Related Health Problems 10 codes and the clinically relevant risk period following vaccination. Vaccination information and adjusted variables were obtained from the database. These analyses were conducted to confirm that the methods described herein work well; therefore, no clinical interpretations were made, and caution is required in the interpretation of our results. This study was approved by the Kyushu University Institutional Review Board for Clinical Research (approval no. 22114–05). The study used secondary data from administrative records of insurance users in cooperating municipalities. The data were originally collected by the municipalities for administrative purposes. Since obtaining informed consent from all subjects is impractical, Kyushu University has publicly disclosed information about the study and guaranteed an opportunity for individuals to opt out.
The performance of the point estimates is presented first. The expected value of the point estimates and Empirical SE are shown in Fig. 1, and values including bias and MSE are provided in Supplementary Table S1. Across all settings, the bias is small. The Empirical SE increases with smaller sample size and larger True RD values, but overall, the estimation accuracy is good. In Scenario 2, the exposure proportion was varied. As the exposure proportion approached 1, the Empirical SE increased slightly.
The point represents the expectation of RD and the error bar represents the Empirical SE (standard error). The different exposure proportions for Scenario 2 are represented by symbols and colors. RD: risk difference, SE: standard error.
The coverage obtained from the standard error is depicted in Fig. 2, and the values are provided in Supplementary Table S2. In Scenarios 1 and 4, the exposure proportion was around 0.5, while Scenarios 2 and 3 involved more extreme exposure proportions (0.7–0.95). The coverage for Scenario 2, where the exposure proportion was varied, is depicted in Supplementary Fig. S2. When the exposure proportion was close to 0.5, the performance of all standard errors was similar. The coverage was excessively high when the RD was null, dropped to around 0.9, and then approached the appropriate coverage of 0.95. In the cases of extreme exposure proportion, the trend in coverage between the OLS SE and robust variance differed greatly. The OLS SE transitioned from under-coverage to over-coverage as the RD increased. The robust variance showed over-coverage when the RD was null but then approached the appropriate coverage.
The point represents the coverage. The different variances are represented by symbols and colors. Because the robust variances (HC0, HC1, HC2, HC3, and HC4) are almost the same value, the lines overlap and only HC4 is visualized. HC: Huber-White, OLS: ordinary least squares, RD: risk difference.
Fig. 3 illustrates the power derived from the SE, and the values are presented in Supplementary Table S3. The power for Scenario 2, where the exposure proportion was varied, is depicted in Supplementary Fig. S3. When the exposure proportion was close to 0.5, the performance of all SEs was similar. However, when the RD was small, the power using robust variance was slightly higher. In cases of extreme exposure proportion, the trend in power between OLS SE and robust variance differed greatly. The power of OLS SE increased once the event count reached 50. Robust variance achieved approximately 70% power even with a small RD.
The point represents the power. The different variances are represented by symbols and colors. Because the robust variances (HC0, HC1, HC2, HC3, and HC4) are almost the same value, the lines overlap and only HC4 is visualized. HC: Huber-White, OLS: ordinary least squares, RD: risk difference.
The proportion of cases including zero cells is shown in Supplementary Table S1. When the exposure proportion was close to 0.5, the proportion of zero-cell cases ranged from approx. 0.6 to 0.8. In cases of extreme exposure proportion, the proportion of zero-cell cases ranged from approx. 0.7 to 0.95.
The Monte Carlo SE for the performance measures are provided in Supplementary Table S4. Since the Monte Carlo SE for coverage and power, derived from each robust variance, were almost identical, and only the results for HC0 are listed in Supplementary Table S4. This study focused particularly on the performance of the SE. From the maximum Monte Carlo SE obtained for coverage and power across all settings, the calculated number of necessary iterations ranged from 380 to 994. The number of iterations for this study was set at 2,000, which was deemed sufficient.
EXAMPLE 1: THE SAFETY OF A BCG VACCINE FOR CHILDREN IN THE VENUS STUDY DATAFor children, we evaluated the association between complications following immunization (International Statistical Classification of Diseases and Related Health Problems 10: T881, with a risk period is unlimited duration) as an AE and BCG vaccination using VENUS study data. Due to the small number of children analyzed, matching unvaccinated individuals to those vaccinated was not performed, and thus the only covariate included in the model was sex. A summary for each group is shown in Supplementary Table S5. The crude RD, the adjusted RD, and the 95% confidence intervals (CIs) calculated from each SE are presented in Table 2. Although the number of outcomes in the vaccinated group was zero, it was possible to estimate the adjusted RD with a CI. The 95%CIs differed slightly among the various variances.
Vaccinated, no. of incidence/no. of vaccinated (%) | Unvaccinated, no. of incidence/no. of unvaccinated (%) | Crude RD % (95%CI) | Adjusted RD % | Variance estimator | (95%CI) | |
---|---|---|---|---|---|---|
Association between BCG vaccine and complications following immunizationa | 4/678 (0.59) | 0/678 (0) | 0.590 (0.014, 1.166) |
0.591 | OLS: HC0: HC3: HC1, HC2, HC4: |
(0.0136 , 1.1686) (0.0131, 1.1691) (0.0118, 1.1704) (0.0125, 1.1697) |
Association between PPSV23 and ITPb | 1/114,080 (0.000877) | 7/299,256 (0.00234) |
−0.00146 (−0.00390, 0.00098) |
−0.00151 | OLS: HC0–HC4: |
(−0.00458, 0.00155) (−0.00381, 0.00079) |
a Unvaccinated participants were randomly matched 1:1 with the vaccinated participant. The vaccination date of the matched vaccinated person was used as the index date of the unvaccinated person. Adjusted model including sex.
b Adjusted model including sex, age at vaccination, and number of medical visits. BCG: Bacille Calmette-Guerin, CI: confidence interval, HC: Huber-White, ITP: immune thrombocytopenic purpura, OLS: ordinary least squares, PPSV23: 23-valent pneumococcal polysaccharide vaccine, RD: risk difference, VENUS: Vaccine Effectiveness, Networking, and Universal Safety.
We also used VENUS study data to evaluate the association between immune thrombocytopenic purpura (ITP) (International Statistical Classification of Diseases and Related Health Problems 10: D69.3, with a risk period of 42 days) as an AE after vaccination and a pneumococcal vaccine among older adults (aged >64 years). To assign a pseudo-index date, the unvaccinated, vaccinated, and unvaccinated individuals were randomly matched at approx. 1:3. The covariates included sex, age at vaccination, and the number of medical visits in the 6 months prior to vaccination. A summary for each group is provided in Supplementary Table S5. The crude RD, adjusted RD, and the 95%CIs are given in Table 2. The trend of estimation was similar to that of the BCG case described above.
PASSes are crucial for public health, and they are characterized by large sample sizes, few event incidences, and potentially extreme exposure frequency. The results of the present study demonstrate that even under such conditions, estimating adjusted RDs using OLS is valid, and the use of robust variance achieves appropriate coverage and high power.
We observed that the point estimates of adjusted RDs performed well across all settings. This aligns with theory, as the unbiasedness of OLS estimators is not affected by the assumption of normality in errors. We observed that when confounders are appropriately adjusted, the behavior of coverage and power varies according to the exposure proportion. In fact, the behavior of these metrics was similar between Scenarios 1 and 4 and between Scenarios 2 and 3. When the exposure proportion was close to 0.5, the performance of the OLS SE and that of robust variance were similar; however, from the perspective of power, robust variance was preferable. In the cases of extreme exposure proportions, the performance of robust variance was better, particularly in terms of power, where the difference between the OLS SE and robust variance was remarkable. Overall, the OLS SE’s coverage was overly broad, indicating that the CIs were excessively wide. This suggests a higher probability of crossing the null hypothesis of RD = 0, leading to low or even zero power in some conditions. Although Cheung did not discuss power, the performance of point estimates and coverage in this study is consistent with Cheung’s results, indicating that in PASS settings, the estimation of adjusted RDs is valid and the use of robust variance is preferable. Non-monotonic changes in coverage were observed in Scenarios 1 and 4; however, the reasons for this are unclear.
We also conducted a performance comparison of five types of robust SEs. In our study setting, there were no substantial differences in the values of the robust SEs. It is known that HC0, HC1, HC2, and HC3 are asymptotically equivalent. Cheung’s study covered sample sizes up to 500, with results indicating superior performance of HC3 in that range. Our findings suggest that for sample sizes ≥ 1,000, the various robust variance estimators may reach a sufficient sample size for asymptotic equivalence. Integrating the results from our study and Cheung’s research, we can conclude that irrespective of the sample size, it is advisable to use HC3 for robust variance estimation when estimating adjusted risk differences using OLS in linear regression models.
In this study, we estimated the adjusted RDs using OLS and estimated the 95%CIs using robust variance on actual PASS data. The estimation was successful for both the study of the BCG vaccine in children and the 23-valent pneumococcal polysaccharide vaccine in older adults. Even though the number of events in the vaccinated group for the BCG vaccine was zero, we were still able to estimate the RD with a CI. In the BCG vaccine study, the 95% CIs differed among the variances. The difference in variances is likely attributed to an insufficient sample size.
Different estimation methods yield different types of effects (estimand), and careful interpretation is required for effects estimated through a regression analysis16). In research aimed at effect estimation, researchers often seek to calculate an average treatment effect which compares risks between scenarios where all samples are exposed versus unexposed, or an average treatment effect in the treated population. The average treatment effect can be estimated through standardization or inverse probability weighting, while the average treatment effect in the treated population is typically estimated using matching methods. For these two estimands, the target population for the effect estimation is clearly defined. In contrast, the effect estimated using a regression model, as used in this study, compares exposed and unexposed cases within strata formed by covariate combinations, which is known as a conditional effect. Conditional effects assume no heterogeneity across strata consisting of covariate combinations—that is, the effects are equal across all strata—unless multiple terms between exposure and covariates are included in the regression model. When there is no heterogeneity in the sample and the regression model excludes interactions between exposure and covariates, the resulting estimates can be interpreted as the average treatment effect with a clear identification of the target population. When heterogeneity exists, the effects estimated through regression may be biased across all strata in the sample. Since it is unclear which population these effects represent, caution is needed when interpreting estimates obtained from regression models. In this study we conducted simulations under settings with no heterogeneity. From the perspective of variance estimators for each approach, under the assumption that all models are correctly specified, the outcome model-based estimator has a smaller asymptotic variance than the propensity score model-based estimator17). Under this assumption, we can consider the outcome model as having higher precision. However, it is important to note that, as in this study, it is essentially impossible to verify whether the models are correctly specified.
The estimation of RDs using OLS and robust variance demonstrated better performance than the typical OLS approach in PASS settings. While studies of PASS settings have reported mostly risk ratios, our present findings may be beneficial for researchers who are feeling uncertain about reporting adjusted RDs in extreme settings. Combined with Cheung’s results and our present results, the findings obtained herein suggest that estimating adjusted RDs using OLS in a linear regression model and interval estimation with robust SE using HC3 regardless of the sample size is valid for a wide range of medical fields. This could also be useful for researchers involved in public health and efficacy evaluations.
The authors declare they have no conflicts of interest regarding this study.
This research was supported by a grant from the Japan Agency for Medical Research and Development (AMED), no. JP21nf0101635.
We are sincerely grateful to Prof. Masataka Taguri (Tokyo Medical University) for the advice on statistical theory and to the Editor-in-Chief and the reviewers for their support in improving this manuscript. OpenAI’s ChatGPT was used to enhance the language and expression of the text in this manuscript.
VENUS data cannot be shared, for privacy/ethical reasons.
SS and YK developed the concept. MM, FM, and HF collected data. SS performed the data analysis. SS interpreted the results of the analyses. SS drafted the original manuscript. All authors have reviewed and edited the manuscript. All the authors have read the manuscript and approved its submission for publication.