Circulation Journal
Online ISSN : 1347-4820
Print ISSN : 1346-9843
ISSN-L : 1346-9843
Editorials
Some Methodological Issues and Future of Risk Prediction Studies in Cardiology
Hideki Origasa
Author information
JOURNAL FREE ACCESS FULL-TEXT HTML

2016 Volume 80 Issue 6 Pages 1314-1315

Details

Risk prediction modeling probably dates back to the article by Wilson et al in 1987.1 In the current era of personalized medicine, studies on prediction models are abundant. The well-known CHADS2 score and others have been validated by risk prediction analysis,24 but with the increasing interest in biomarkers, cellular and genetic markers, further growth in this area is greatly anticipated. As shown in Table 1, there are several methods of executing risk prediction. Like the Framingham heart attack risk score,5 Yatsuya et al6 report in this issue of the Journal that they used a risk equation approach that might be the most frequently applied. Because risk prediction has become an important substudy in cohort or registry studies, the GRIPS7 and TRIPOD8 guidelines have been published to strengthen the reporting and conducting of such studies.

Table 1. Methods of Conducting the Risk Prediction of Disease
1. Risk equations (eg, Framingham heart attack risk score5)
2. Risk scores (eg, CHADS2 score2)
3. Nomograms
4. Decision trees such as classification and regression tree (CART)
5. Machine learning for disease prediction using biological neural network

Article p 1386

As shown in Table 2, there are 4 questions that should be asked when developing a new risk equation. Discrimination measures how accurately the risk prediction equation distinguishes people who will develop disease from those who will not. The overall ability to distinguish people without disease from those with disease is typically described by the c-statistic.9 The change in c-statistic, as a measure of how discrimination changes, provides an indication of that improvement. Despite that, the c-statistic is insensitive to small improvements in model performance when a new marker is added to a model that already includes important predictors. Moderately strong association between exposure and outcome (say, odds ratio >3.0) is necessary for the addition of a new risk factor to meaningfully improve overall discrimination.10 An unpublished document by Pencina has a real example of that for the development of first coronary event (https://www.lerner.ccf.org/qhs/outcomes/documents/pencina.pdf). An initial model comprised age, sex, diabetes mellitus, smoking status, systolic blood pressure, and total cholesterol as the standard risk factors. High-density lipoprotein (HDL) cholesterol was a new marker. It was highly significant (hazard ratio 0.65, P<0.0001), even when all other variables were kept in the model. However, an increase in the c-statistic with inclusion of HDL cholesterol was minimal (ie, 0.762 to 0.774; P=0.092).

Table 2. Important Questions to Ask to Validate a Risk Score Equation
1. How accurate is the score?
 Sensitivity, Specificity
2. How well can the score discriminate?
 ROC curve, Harrell’s c-statistic
3. Is the score correctly calibrated?
 Agreement between observed and predicted values
 Hosmer-Lemeshow goodness-of-fit statistic
 Net reclassification improvement (NRI)
 Integrated discrimination improvement (IDI)
4. Is the score generalizable?
 Internal/external/temporal validation, cross-validation, bootstrapping

Because the question regarding the usefulness of discrimination measures has been raised, research has focused on finding a quantity to measure the extent of classification of persons with and without the outcome compared with the basic model. It is usually referred to as the measure of calibration, and net reclassification improvement (NRI) is the most frequently applied measure of that.11 It is more sensitive and interpretable than the discrimination measure and quantifies the number of persons correctly reclassified into clinically meaningful higher or lower risk categories with the addition with a new predictor. Integrated discrimination improvement (IDI) is a continuous version of NRI with probability differences used instead of categories.11 For example, the CHADS2 score produced the c-statistic of 0.641 for ischemic stroke, which was exactly the same as when using the CHADS2 and HAS-BLED scores.12 However, NRI increased by 4.7% when adding the HAS-BLED score to CHADS2. IDI also increased by 1.1%.12 Thus, a calibration measure such as NRI and IDI ought to be more frequently applied than the c-statistic.

Adding a new biomarker to a model with a published risk score usually leads to greater NRI and IDI and increases in the c-statistic, and so will give an overly optimistic view of the true predictive ability of the biomarker.13 We should rather assess the incremental yield of a new biomarker by re-estimating the coefficients for the standard predictors using the current study data, not based on the previous model including standard risk factors. Furthermore, correlated variables may remain in the model using backward elimination, while none of them might enter the model using forward selection.14 Variable selection should be more carefully considered to determine the most plausible model for risk prediction than before.

An initial model for predicting heart attack comprised age, blood pressure class induced by systolic and diastolic blood pressures, low-density lipoprotein or total cholesterol, diabetes mellitus, and smoking status.5 Later, HDL cholesterol was entered into the model.15 Yatsuya et al6 further added non-HDL cholesterol and use of antihypertensives. It is interesting that they considered the effect of high blood pressure separately with and without medication. Further, they used an interactive term for them in model building. It may have enabled them to identify a difference in the importance of high blood pressure with and without medication. Although they also incorporated an interaction between age and sex, a final model might be constructed separately by sex.

Discrimination greater than 80% is extraordinary high because the residual perhaps caused by genetic and accidental events is nothing but 20%. An extensive external validation using previous studies of the Framingham risk score by Wilson et al5 demonstrated that the performance varied considerably among study samples.16 It is perhaps because most studies have been conducted as retrospective analyses. It suggests that we should re-confirm the predictions in a prospective manner hereafter.

The recognition of new risk factors will lead to a better approach to identifying persons who are in the early stages of, or at high risk for, the disease of concern. Rules based on risk equations are intended to guide clinicians in their everyday decision making or patient communication. Yet, the only persons to provide appropriate advice should be the clinicians or healthcare professionals who are familiar with the personal environment/view as well as medical information necessary for risk prediction. The ultimate target of risk prediction must be an impact analysis wherein the usefulness of the score in the clinical setting will be evaluated in terms of cost-benefit, patient satisfaction, and time/resource allocation.

References
 
© 2016 THE JAPANESE CIRCULATION SOCIETY
feedback
Top