Article ID: CJ-19-0320
Background: Cardiovascular guidelines include risk prediction models for decision making that lack the capacity to include novel predictors.
Methods and Results: We explored a new “predictor patch” approach to calibrating the predicted risk from a base model according to 2 components from outside datasets: (1) the difference in observed vs. expected values of novel predictors and (2) the hazard ratios (HRs) for novel predictors, in a scenario of adding kidney measures for cardiovascular mortality. Using 4 US cohorts (n=54,425) we alternately chose 1 as the base dataset and constructed a base prediction model with traditional predictors for cross-validation. In the 3 other “outside” datasets, we developed a linear regression model with traditional predictors for estimating expected values of glomerular filtration rate and albuminuria and obtained their adjusted HRs of cardiovascular mortality, together constituting a “patch” for adding kidney measures to the base model. The base model predicted cardiovascular mortality well in each cohort (c-statistic 0.78–0.91). The addition of kidney measures using a patch significantly improved discrimination (cross-validated ∆c-statistic 0.006 [0.004–0.008]) to a similar degree as refitting these kidney measures in each base dataset.
Conclusions: The addition of kidney measures using our new “predictor patch” approach based on estimates from outside datasets improved cardiovascular mortality prediction based on traditional predictors, providing an option to incorporate novel predictors to an existing prediction model.
Risk prediction is a central element of disease prevention and clinical management,1,2 mainly because therapies generally reduce the relative risk of the outcomes of interest in a consistent way across different subgroups. Thus, treating high-risk populations is more efficient than treating low-risk populations for the same level of an individual risk factor.3 For example, blood pressure reduction has been shown to reduce the risk of cardiovascular events by 15–20% regardless of the patient’s baseline risk and thus the number needed to treat for preventing an event is much lower in the case of treating higher vs. lower risk patients (e.g., 26 in 5 years if predicted risk >21% vs. 71 if <11%).3 Indeed, several major clinical guidelines incorporate risk prediction models based on traditional risk factors (e.g., blood pressure and diabetes for cardiovascular disease) for clinical decision making.1,2,4
Editorial p ????
Unfortunately, none of the existing cardiovascular risk prediction tools in major clinical guidelines have the practical ability to incorporate novel biomarkers, despite the fact that several promising biomarkers have been identified (e.g., coronary artery calcium for coronary disease risk prediction5). Some of these promising predictors may be expensive to measure and cannot be widely incorporated. However, others are routinely assessed in clinical practice, but the base datasets from which risk prediction models in clinical guidelines were derived often do not have data on those promising predictors, precluding the inclusion of novel predictors in established risk prediction models.5
In this situation, some guidelines categorize specific groups of patients as high-risk outside of a risk prediction scheme (e.g., diabetic patients with albuminuria are considered to be at very high cardiovascular risk in a European guideline6) (Supplementary Figure 1). However, this approach does not take into account risk variation because of other risk factors or provide any guidance to patients not included in these specific groups (e.g., non-diabetic patients with very high albuminuria). As an alternative approach, some investigators have developed their own risk prediction models with novel predictors to overcome this situation. However, given the difference in the baseline risk of the datasets that were used to derive the guideline prediction models and the investigators’ models, those 2 models may provide quite different predicted risk for the same person, which may well lead to lack of uptake by clinicians.
To overcome these issues, we propose a new approach, using a “predictor patch” based on data from outside datasets, to add non-traditional predictors to a base prediction model without remodeling non-traditional predictors in the base dataset. Our approach calibrates predicted risk based on 2 elements (green text in Figure 1): (1) the difference between the observed value and expected value of an additional predictor (e.g., if a predictor has a positive association with an outcome, individuals with higher than expected values of this additional predictor would have a higher risk than predicted solely based on traditional predictors) and (2) the hazard ratio (HR) of an outcome of interest related to the difference in an additional predictor (e.g., calibration for a difference between observed vs. expected would be larger if the association of an additional predictor with an outcome is stronger). Expected values and HRs are obtained from outside datasets.
Conceptual scheme for calibrating the risk based on (1) the difference between observed and expected values of CKD measures and (2) the relative risk according to CKD measures. CKD, chronic kidney disease.
To explain our methodology we chose a scenario of wanting to add 2 chronic kidney disease (CKD) measures (glomerular filtration rate [GFR] and urine albumin-to-creatinine ratio [ACR]7) to a base risk prediction model with traditional predictors in the context of cardiovascular (CV) mortality risk. This is a practical example, because we have previously shown that these kidney markers do, indeed, add to CV risk prediction.8 Also, major CV disease guidelines acknowledge these kidney measures as important predictors, but none of the established risk models in those guidelines include these kidney markers. In addition, both GFR and albuminuria are often measured in clinical practice.8,9
To both test our novel approach and expect reasonable generalizability, we used data from 4 US community-based cohort studies, each of which had data on traditional CV predictors, kidney measures, and CV mortality: the Atherosclerosis Risk in Communities (ARIC) study, the Multi-Ethnic Study of Atherosclerosis (MESA), the National Health and Nutrition Examination Survey (NHANES) III, and NHANES 1999–2010. Details of each study have been reported previously.10–12 Briefly, ARIC enrolled 15,792 mostly white and black men and women from 4 US communities (Washington County, Maryland; suburban Minneapolis, Minnesota; Jackson, Mississippi; and Forsyth County, North Carolina) in 1987–1989. For the present study, we used data from 9,351 participants who attended visit 4 (1996–1998, age range 52–75 years) and were free of a history of CV disease (coronary artery disease and stroke). MESA enrolled 6,814 participants aged 45–84 years without a history of CV disease in 2000–2002 from 6 US communities: Los Angeles County, California; Chicago, Illinois; Baltimore, Maryland; St. Paul, Minnesota; northern New York City, New York; and Forsyth County, North Carolina. MESA was designed to include white, black, Hispanic, and Chinese participants. For the present study, we included 6,704 participants with data on kidney measures and traditional risk factors as detailed below. For NHANES, we included white, black, and Hispanic men and women without a history of CV disease and with data on GFR and albuminuria, comprising 14,103 participants from NHANES III and 24,267 participants from NHANES 1999–2010.
Estimated GFR (eGFR) was calculated using the CKD-EPI creatinine equation in all 4 studies.7,13 Serum creatinine was measured using a modified kinetic Jaffé method in ARIC,14 rate reflectance spectrophotometry using thin film adaptation of the creatine amidinohydrolase method in MESA,10 and a kinetic Jaffé method in NHANES.12 Urine albumin was measured by nephelometry in ARIC,14 the Array 360 CE Protein Analyzer in MESA,10 and solid-phase fluorescent immunoassay in NHANES.12 Urine creatinine was measured using the Jaffé method in ARIC and NHANES, and by the Vitros 950IRC instrument in MESA.
We considered all the predictors used in the American Heart Association (AHA) and the American College of Cardiology (ACC) Pooled Cohort Equation as traditional risk factors: age, sex, race/ethnicity, systolic blood pressure (SBP), use of antihypertensive medications, diabetes, total cholesterol (TC) and high-density lipoprotein cholesterol (HDL-C), and smoking.5 Data collection of these traditional predictors was conducted according to a standard protocol in each study, as reported previously.10–12 Race/ethnicity was categorized as white, black, Hispanic, and Asian. Diabetes mellitus was defined as a fasting glucose ≥7.0 mmol/L, self-reported history of diabetes, or use of glucose-lowering medications. Smoking status was dichotomized as current vs. former/never.
Given its consistent availability in all 4 cohorts, the outcome of interest was CV mortality defined as death from myocardial infarction, stroke, heart failure, or sudden cardiac death.8
Analyses were conducted using Stata/MP 14 (www.stata.com). A P-value <0.05 was considered significant. Baseline characteristics were summarized as mean (SD) or median [interquartile interval, IQI] if continuous variables and number (%) if categorical, across the 4 cohorts.
A scheme for developing a “patch” in outside datasets and applying it to a base dataset is shown in Supplementary Figure 2. We first selected 3 outside datasets and developed a linear regression model for estimating expected values of eGFR and log-ACR based on traditional predictors (ARIC, NHANES III, and NHANES 1999–2010 in Supplementary Figure 2 as an example). Then, we obtained the log HR (β) for CV mortality for eGFR (βeGFR) and log-ACR (βlog-ACR), adjusted for traditional predictors. βeGFR and βlog-ACR were estimated in each of the 3 outside datasets first and then meta-analyzed using fixed-effect models. In estimating the log HRs, we fixed the log HRs for the traditional predictors at their values from the base dataset. The linear model for estimating expected eGFR and ACR according to traditional predictors and the log HR for eGFR and ACR together from outside datasets constituted the “CKD patch”.
Next, we applied the “CKD patch” to a base dataset (MESA in Supplementary Figure 2) and calibrated the predicted risk according to the difference between observed kidney measures in the base dataset vs. expected kidney measures (based on traditional predictors) and the log HR for the kidney measures in each participant in the base dataset. The formula for this calibration was:
hi(t)new = hi(t)original * exp(βeGFR *[observed eGFRi− expected eGFRi]) + βlog-ACR*[observed log-ACRi− expected log-ACRi]) (eqn. 1)
where i indicated person i in the base dataset; h(t)new, calibrated hazard incorporating observed values of CKD measures; h(t)original, original predicted hazard with traditional predictors in the base dataset; βeGFR (βlog-ACR), log HR of eGFR (log-ACR) from the 3 outside datasets, observed eGFRi (log-ACRi), observed values of eGFR (log-ACR) in person i in the base dataset; expected eGFRi (log-ACRi), expected values of eGFR (log-ACR) based on observed traditional predictors in person i in the base dataset according to a linear model from the 3 outside datasets.
It is important that the expected value of the kidney measures was derived based on the traditional predictors, because the coefficients of the traditional predictors in the base model may include residual confounding by the kidney measures. For example, the coefficient for hypertension will include the risk associated with the higher average level of albuminuria in this group but not individual deviations in albuminuria from this average confounding.
To evaluate the performance of the “CKD patch”, we plotted predicted and observed risk from the base model with only traditional predictors as well as when we used the “CKD patch”. Also, we compared the “CKD patch” to a fully refit model incorporating both traditional predictors (i.e., age, sex, race/ethnicity, SBP, use of antihypertensive medications, diabetes, TC and HDL-C, and smoking) and kidney measures (i.e., eGFR and ACR) in the base dataset. This simulated the scenario in which the base dataset has data of non-traditional predictors. We also quantified the difference in Harrell’s c-statistic15 among the base model, the “CKD patch”, and the fully refit model. In addition, we assessed categorical net reclassification improvement (NRI),16 with risk categories of 5% and 10% in 10 years.4 We repeated the entire process for each cohort as the base dataset (and the remaining 3 cohorts as outside datasets) for cross-validation.
Characteristics of each study are summarized in Table 1. The average age was similar between ARIC and MESA (∼62 years) and between the 2 NHANES cohorts (∼45 years). Approximately 20–30% were black across the 4 cohorts. Reflecting age differences, risk factor profiles were generally better (lower prevalence of diabetes, lower BP, higher kidney function) in the NHANES cohorts than in the other 2 cohorts. However, ACR was higher on average in the NHANES cohorts than in ARIC and MESA. The prevalence of current smokers was higher in the NHANES cohorts than in the other 2 cohorts, but the combined prevalence of current and former smokers was similar in all 4 cohorts. We did not see an evident difference in TC or HDL-C across the cohorts.
Numbers represent percent (except N indicating total sample size), mean (SD), or median [interquartile interval]. ACR, urine albumin-to-creatinine ratio; ARIC, Atherosclerosis Risk in Communities Study; eGFR, estimated glomerular filtration rate; HDL-C, high-density lipoprotein cholesterol; MESA, Multi-Ethnic Study of Atherosclerosis; NHANES National Health and Nutrition Examination Survey; SBP, systolic blood pressure; TC, total cholesterol.
In the 3 outside datasets, using traditional predictors, we developed a linear regression model to estimate expected levels of eGFR and log-ACR conditional on the traditional predictors (estimation model) (Supplementary Tables 1,2). Next, we applied each estimation model to the base dataset. The root mean square errors ranged from 13.91–15.67 for eGFR (Supplementary Table 1) and 0.50–0.73 for log-ACR (Supplementary Table 2).
Results for expected values of CKD measures (based on an estimation model from MESA, NHANES III, and NHANES 1999–2010) and their observed values in ARIC are shown in Figure 2. There were a number of participants with considerably lower eGFR and higher ACR than expected, indicating that, in those individuals, predicted risk based on only traditional predictors was likely to underestimate their actual risk. The opposite is true for participants with considerably higher eGFR or lower ACR than expected. In the scenario shown in Figure 2, there were 29% of ARIC participants who had 15 mL/min/1.73 m2 lower or higher eGFR than expected and 17% who had 8-fold higher or lower ACR than expected. Results for the other 3 scenarios with each of MESA, NHANES III, and NHANES 1999–2010 as the base dataset are shown in Supplementary Figures 3–5.
Scatter plot of residual (observed minus expected) vs. expected eGFR and log8-ACR in ARIC. Expected values are based on a linear regression model from the other 3 cohorts (MESA, NHANES III, and NHANES 1999–2010), regarded as outside datasets. ACR, urine albumin-to-creatinine ratio; ARIC, Atherosclerosis Risk in Communities; eGFR, estimated glomerular filtration rate; MESA, Multi-Ethnic Study of Atherosclerosis; NHANES, National Health and Nutrition Examination Survey.
Based on meta-analysis of 3 outside datasets, both kidney measures were generally associated with CV mortality independently of each other and traditional CV risk factors (Table 2). However, the results for lower eGFR were not necessarily consistent across the 4 combinations of 3 outside datasets. For example, 15 mL/min/1.73 m2 lower eGFR in the range <60 mL/min/1.73 m2 was significantly associated with CV mortality except when MESA, NHANES III, and NHANES 1999–2010 were treated as the 3 outside datasets, whereas lower eGFR in the range 60–90 mL/min/1.73 m2 showed a significant association only with MESA, NHANES III, and NHANES 1999–2010 or ARIC, MESA, and NHANES 1999–2010 as the outside datasets. In contrast, ACR was consistently related to higher risk of CV mortality regardless of the combination of outside datasets, with adjusted HRs between 1.4 and 1.6 per 8-fold higher values. Supplementary Table 3 shows the HRs for kidney measures from each dataset used for the meta-analyzed HRs in Table 2. Supplementary Table 4 summarizes the adjusted HRs of the kidney measures from a fully refit model including traditional predictors in each dataset.
CI, confidence interval. Other abbreviations as in Table 1.
Using estimation models of eGFR and ACR, as well as the adjusted HRs of CV mortality for these 2 kidney measures from 3 outside datasets, we calibrated predicted risk according to observed values of eGFR and ACR in every participant in a base dataset using equation 1. Figure 3 demonstrates the predicted vs. observed risk in each scenario when ARIC, MESA, NHANES III, and NHANES 1999–2010 served as a base dataset respectively, and the other 3 were treated as the outside datasets. Black dots represent predicted vs. observed risk for a base model with only traditional risk factors in the base dataset. Red dots reflect the calibrated risk prediction based on the “CKD patch” (“base model + CKD patch” in Figure 3), whereas blue dots represent predicted risk using a model including both traditional predictors and the 2 kidney measures in the base dataset (“fully refit model” in Figure 3).
Calibration plot (observed vs. predicted risk) for cardiovascular mortality risk for 3 models. The “base model” indicates a model with traditional predictors in a base dataset, “base model + CKD patch” represents a calibrated predicted risk using a CKD patch, and “fully refit model” reflects a model with traditional predictors and CKD measures in a base dataset. Abbreviations as in Figures 1,2.
Overall, all 3 lines were around the diagonal line of identity, indicating overall good calibration. In all 4 scenarios, the highest risk decile based on the “CKD patch” (red dot) shifted towards the upper right corner from the highest risk decile for the base model (black dot), indicating that the addition of kidney measures with the “CKD patch” contributed to identifying individuals at higher risk than originally predicted by the base model with only traditional predictors, and indeed they had higher risk. In general, the shift for the remaining risk deciles was less evident than that for the highest risk decile. The patterns for fully refit models (blue dots) were largely similar when the “CKD patch” was used (red dots).
Subsequently, we contrasted the c-statistics across 3 approaches (base model, base model + “CKD patch”, and fully refit model) in the 4 scenarios (ARIC, MESA, NHANES III, and NHANES 1999–2010) of a base dataset (Table 3). C-statistics ranged from 0.779 to 0.909 for the model with traditional predictors in the base model in each study. The fully refit models showed modest but significant improvement of c-statistics in all the scenarios except MESA. The “CKD patch” significantly improved the risk discrimination of CV mortality from the base model in all studies except MESA as well. The degree of improvement by the “CKD patch” was largely similar to fully refit models in all studies, with the pooled cross-validated difference in c-statistic of 0.006 (95% confidence interval 0.004–0.008) in both approaches. We also confirmed a significantly positive NRI with the CKD patch approach (Supplementary Table 5).
“Base model” indicates a model with traditional predictors in a base dataset; “base model+CKD patch” represents a calibrated predicted risk using the CKD patch; “fully refit model” reflects a model with traditional predictors and the CKD measures in the base dataset. Other abbreviations as in Tables 1,2.
Here, we have demonstrated a new approach, using a “predictor patch” based on estimates from outside datasets, to incorporating non-traditional predictors into a base model without fully refitting both non-traditional and traditional predictors in the base dataset from which the base prediction model was derived. In the scenario of adding kidney measures, our new approach with a “CKD patch” demonstrated risk prediction improvement compared with the base model. The degree of improvement was similar to fully refitting models, although they were not identical in each study. Importantly, we confirmed cross-validation of our new approach in 4 scenarios by treating 1 of 4 US community-based cohorts as the base dataset and the remaining 3 as the outside datasets in turn. This approach is an option to take into account non-traditional predictors in clinical practice when it is not practical to develop a new model in datasets used to derive relevant existing prediction models. For example, kidney measures are not uniformly measured in the cohorts that derived the AHA/ACC Pooled Cohort Equations.5
We are not aware of any attempts to use the approach we explored in this study in order to implement new predictors in existing prediction models even when relevant derived datasets do not have data on relevant new predictors. Theoretically, our approach can be applied to any predictors and any outcomes. Our approach would be particularly helpful when predictors are already collected in clinical practice, but healthcare providers have not been able to effectively incorporate the information of those predictors. This is exactly the scenario for kidney measures tested here. Indeed, although source data might be slightly old, it is estimated that serum creatinine for eGFR is measured approximately 300 million times in the USA every year.17 In addition, the assessment of albuminuria is recommended in patients with diabetes, hypertension, and CKD.7,18,19 However, none of the major clinical guidelines has adopted CV risk prediction tools incorporating these kidney measures.
There are some conditions for our approach to be effective in settings beyond what we tested here for CV mortality by adding CKD measures. First, strong associations between non-traditional predictors and outcomes of interest would be required, as is true for any prediction model.20 Second, the estimation of expected values of non-traditional predictors using traditional predictors should fit reasonably well, but should have enough residual variance to model a difference between observed and expected values. If this estimation is perfect without any residual variation, then the information in the additional predictors is fully incorporated into the existing risk factors and there is no need for a patch. However, it seems extremely rare to see such a perfect estimation for most predictors.21
All cohorts were from the USA and community-based, and thus whether and to what extent our new approach works in other regions, countries, or specific clinical populations should be tested in future studies. In this context, the CV mortality risk prediction tested in this study was particularly relevant to Europe because the European Society of Cardiology Guideline for Cardiovascular Prevention is based on predicted risk of CV mortality using SCORE.6 Given our experience and the finding of strong associations, we explored kidney measures and CV mortality for this proof-of-concept study.8,9 Thus, confirmatory studies for other non-traditional predictors and other outcomes are needed. Also, we cannot say how many predictors can be appropriately added using this patch approach. Although we explored a scenario where 2 predictors were added to a base model, in principle there is no limit to how many new predictors could be added. The practical utility of adding more predictors will depend on correlations among additional and traditional predictors, as well as the degree of confounding by additional predictors. This should be explored in future studies.
We have demonstrated a new approach, “predictor patch” based on estimates from outside datasets, to incorporating additional predictors into an existing prediction model without remodeling the non-traditional predictors in its derivation base dataset. Although confirmatory investigations are needed, theoretically, the “predictor patch” approach described here can be applied to a wide range of settings and will allow researchers and healthcare providers to efficiently adopt additional predictors for improving risk prediction in the context of existing prediction models.
The ARIC study has been funded in whole or in part with Federal funds from the National Heart, Lung, and Blood Institute, National Institutes of Health, Department of Health and Human Services, under Contract nos. HHSN268201700001I, HHSN268201700002I, HHSN268201700003I, HHSN268201700005I, HHSN268201700004I. MESA was supported by National Heart, Lung, and Blood Institute contracts N01-HC-95159–N01-HC-95169 and National Center for Research Resource Grants UL1-RR-024156 and UL1-RR-025005. The authors thank the staff and participants of the ARIC study for their important contributions. The authors also thank the other investigators, the staff, and the participants of MESA for their valuable contributions. A full list of participating MESA investigators and institutions can be found at http://www.mesa-nhlbi.org.
K.M. received consultancy and research funding from Kyowa Hakko Kirin outside of the work and consultancy from Akebia and Healthy.io outside of the work.
Please find supplementary file(s);