Some Methodological Issues and Future of Risk Prediction Studies in Cardiology

Hideki Origasa

doi:10.1253/circj.CJ-16-0382

Risk prediction modeling probably dates back to the article by Wilson et al in 1987.¹ In the current era of personalized medicine, studies on prediction models are abundant. The well-known CHADS2 score and others have been validated by risk prediction analysis,²^–⁴ but with the increasing interest in biomarkers, cellular and genetic markers, further growth in this area is greatly anticipated. As shown in Table 1, there are several methods of executing risk prediction. Like the Framingham heart attack risk score,⁵ Yatsuya et al⁶ report in this issue of the Journal that they used a risk equation approach that might be the most frequently applied. Because risk prediction has become an important substudy in cohort or registry studies, the GRIPS⁷ and TRIPOD⁸ guidelines have been published to strengthen the reporting and conducting of such studies.

Table 1. Methods of Conducting the Risk Prediction of Disease

1. Risk equations (eg, Framingham heart attack risk score⁵)

2. Risk scores (eg, CHADS2 score²)

3. Nomograms

4. Decision trees such as classification and regression tree (CART)

5. Machine learning for disease prediction using biological neural network

Article p 1386

As shown in Table 2, there are 4 questions that should be asked when developing a new risk equation. Discrimination measures how accurately the risk prediction equation distinguishes people who will develop disease from those who will not. The overall ability to distinguish people without disease from those with disease is typically described by the c-statistic.⁹ The change in c-statistic, as a measure of how discrimination changes, provides an indication of that improvement. Despite that, the c-statistic is insensitive to small improvements in model performance when a new marker is added to a model that already includes important predictors. Moderately strong association between exposure and outcome (say, odds ratio >3.0) is necessary for the addition of a new risk factor to meaningfully improve overall discrimination.¹⁰ An unpublished document by Pencina has a real example of that for the development of first coronary event (https://www.lerner.ccf.org/qhs/outcomes/documents/pencina.pdf). An initial model comprised age, sex, diabetes mellitus, smoking status, systolic blood pressure, and total cholesterol as the standard risk factors. High-density lipoprotein (HDL) cholesterol was a new marker. It was highly significant (hazard ratio 0.65, P<0.0001), even when all other variables were kept in the model. However, an increase in the c-statistic with inclusion of HDL cholesterol was minimal (ie, 0.762 to 0.774; P=0.092).

Table 2. Important Questions to Ask to Validate a Risk Score Equation

1. How accurate is the score?

　Sensitivity, Specificity

2. How well can the score discriminate?

　ROC curve, Harrell’s c-statistic

3. Is the score correctly calibrated?

　Agreement between observed and predicted values

　Hosmer-Lemeshow goodness-of-fit statistic

　Net reclassification improvement (NRI)

　Integrated discrimination improvement (IDI)

4. Is the score generalizable?

　Internal/external/temporal validation, cross-validation, bootstrapping

Because the question regarding the usefulness of discrimination measures has been raised, research has focused on finding a quantity to measure the extent of classification of persons with and without the outcome compared with the basic model. It is usually referred to as the measure of calibration, and net reclassification improvement (NRI) is the most frequently applied measure of that.¹¹ It is more sensitive and interpretable than the discrimination measure and quantifies the number of persons correctly reclassified into clinically meaningful higher or lower risk categories with the addition with a new predictor. Integrated discrimination improvement (IDI) is a continuous version of NRI with probability differences used instead of categories.¹¹ For example, the CHADS2 score produced the c-statistic of 0.641 for ischemic stroke, which was exactly the same as when using the CHADS2 and HAS-BLED scores.¹² However, NRI increased by 4.7% when adding the HAS-BLED score to CHADS2. IDI also increased by 1.1%.¹² Thus, a calibration measure such as NRI and IDI ought to be more frequently applied than the c-statistic.

Adding a new biomarker to a model with a published risk score usually leads to greater NRI and IDI and increases in the c-statistic, and so will give an overly optimistic view of the true predictive ability of the biomarker.¹³ We should rather assess the incremental yield of a new biomarker by re-estimating the coefficients for the standard predictors using the current study data, not based on the previous model including standard risk factors. Furthermore, correlated variables may remain in the model using backward elimination, while none of them might enter the model using forward selection.¹⁴ Variable selection should be more carefully considered to determine the most plausible model for risk prediction than before.

An initial model for predicting heart attack comprised age, blood pressure class induced by systolic and diastolic blood pressures, low-density lipoprotein or total cholesterol, diabetes mellitus, and smoking status.⁵ Later, HDL cholesterol was entered into the model.¹⁵ Yatsuya et al⁶ further added non-HDL cholesterol and use of antihypertensives. It is interesting that they considered the effect of high blood pressure separately with and without medication. Further, they used an interactive term for them in model building. It may have enabled them to identify a difference in the importance of high blood pressure with and without medication. Although they also incorporated an interaction between age and sex, a final model might be constructed separately by sex.

Discrimination greater than 80% is extraordinary high because the residual perhaps caused by genetic and accidental events is nothing but 20%. An extensive external validation using previous studies of the Framingham risk score by Wilson et al⁵ demonstrated that the performance varied considerably among study samples.¹⁶ It is perhaps because most studies have been conducted as retrospective analyses. It suggests that we should re-confirm the predictions in a prospective manner hereafter.

The recognition of new risk factors will lead to a better approach to identifying persons who are in the early stages of, or at high risk for, the disease of concern. Rules based on risk equations are intended to guide clinicians in their everyday decision making or patient communication. Yet, the only persons to provide appropriate advice should be the clinicians or healthcare professionals who are familiar with the personal environment/view as well as medical information necessary for risk prediction. The ultimate target of risk prediction must be an impact analysis wherein the usefulness of the score in the clinical setting will be evaluated in terms of cost-benefit, patient satisfaction, and time/resource allocation.

References

1. Wilson PWF, Castelli WP, Kannel WB. Coronary risk prediction in adults: The Framingham heart study. Am J Cardiol 1987; 59: 91G–94G.
2. Gage BF, Waterman AD, Shannon W, Boechler M, Rich MW, Radford MJ. Validation of clinical classification schemes for predicting stroke: Results from the national registry of atrial fibrillation. JAMA 2001; 285: 2864–2870.
3. Okumura K, Inoue H, Atarashi H, Yamashita T, Tomita H, Origasa H. Validation of CH₂DS₂-VASc and HAS-BLED scores in Japanese patients with nonvalvular atrial fibrillation: An analysis of the J-RHYTHM registry. Circ J 2014; 78: 593–599.
4. Fujii T, Suzuki T, Torii S, Murakami T, Nakano M, Nakazawa G, et al. Diagnostic accuracy of global registry of acute coronary events (GRACE) risk score in ST-elevation myocardial infarction for in-hospital and 360-day mortality in Japanese patients. Circ J 2014; 78: 2950–2954.
5. Wilson PWF, D’Agostino RB, Levy D, Belanger AM, Silbershatz H, Kannel WB. Prediction of coronary heart disease using risk factor categories. Circulation 1998; 97: 1837–1847.
6. Yatsuya H, Iso H, Li Y, Yamagishi K, Kokubo Y, Saito I, et al. Development of a risk equation for the incidence of coronary artery disease and ischemic stroke for middle-aged Japanese: Japan Public Health Center-based prospective study. Circ J 2016; 80: 1386–1395.
7. Janssen ACJW, Ioannidis JPA, van Duijin DM, Little J, Khoury M. Strengthening the reporting of genetic risk prediction studies: The GRIPS statement. Eur J Epidemiol 2011; 26: 255–259.
8. Collins GS, Reitman JB, Altman DG, Moons KGM. Transparent reporting of multivariable prediction model for individual prognosis or diagnosis (TRIPOD): The TRIPOD statement. Ann Intern Med 2015; 162: 55–63.
9. Harrell FE, Lee K, Mark DB. Multivariable prognostic models: Issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Statist Med 1996; 15: 361–387.
10. Ware JH. The limitation of risk factors as prognostic tools. N Engl J Med 2006; 355: 2615–2617.
11. Pencina MJ, D’Agostino RB, D’Agostino RB Jr, Vasan RS. Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond. Statist Med 2008; 27: 157–172.
12. Banerjee A, Fauchier L, Bernard-Brunet A, Clementy N, Lip GYH. Composite risk factors and composite endpoints in the risk prediction of outcomes in anticoagulated patients with atrial fibrillation: The Loire Valley Atrial Fibrillation Project. Thromb Haemost 2014; 111: 549–556.
13. Xanthakis V, Sullivan LM, Vasan RS, Benjamin EJ, Massaro JM, D’Agostino RB Sr, et al. Assessing the incremental predictive performance of novel biomarkers over standard prediction. Statist Med 2014; 33: 2577–2584.
14. Derksen S, Kesselman HJ. Backward, forward, and stepwise automated subset selection algorithms: Frequency of obtaining authentic and noise variables. Br J Math Stat Psychol 1992; 45: 265–282.
15. D’Agostino RB Sr, Grundy S, Sullivan LM, Wilson P. Validation of the Framingham coronary heart disease prediction scores: Results of a multiple ethnic groups investigation. JAMA 2001; 286: 180–187.
16. Brindle P, Beswick A, Fahey T, Ebrahim S. Accuracy and impact of risk assessment in the primary prevention of cardiovascular disease: A systematic review. Heart 2006; 92: 1752–1759.

Corresponding author

Register with J-STAGE for free!