論文ID: CJ-17-1185
Predicting a patient’s prognosis advances medical decision making in clinical settings. Risk prediction models (also called prognostic models, prediction rules, or risk scores) are tools to estimate individual patients’ risk or probability by numerical values. Although many prediction models have been published, few have been used in routine clinical settings because of inconvenience and complexity.1 In this issue of the Journal, Hu et al show how they applied the CHA2DS2-VASc score to predict the incidence of atrial fibrillation (AF) in chronic obstructive pulmonary disease (COPD) patients.2 Previous studies have reported prediction models for AF based on community cohorts,3–5 but most cardiologists are not familiar with these community-based prediction models. It seems reasonable to evaluate individual risk with what we already know, but we must be aware that CHA2DS2-VASc score was developed as the model for predicting ischemic stroke in patients with AF, not for incident AF in COPD patients. It is possible that applying the wrong prediction model may cause over- or underestimation of a patient’s risk. In order to reduce this, we need to understand the evaluation methods, risk prediction models and the reasons why high-performance models are complex.
Article p ????
Performance of Prediction ModelsWe need to assess the performance of risk prediction models more carefully before applying their results in the clinical setting. The performance of risk prediction models is divided into two components: discrimination and calibration.6
Discrimination is the ability of the prediction model to distinguish whether high-risk patients are in fact low risk.6 A well-known measurement tool of discrimination is the receiver-operating characteristic (ROC) curve or the C-statistic. The C-statistic defines the probability that a randomly selected patient who developed an event had a higher risk score than a patient who had not developed the event. The ROC curve draws the plot of the sensitivity and (1−specificity) for all possible cutoff points (Figure 1). When the outcome is a binary event, the area under the ROC curve (AUC) is equivalent to the C-statistic, which ranges from 0.5 to 1, where 0.5 indicates no discriminatory ability and 1 indicates perfect discriminatory ability. If the C-statistic of a prediction model is low, then it shows that the model does not classify patients to the correct risk category. When dealing with censored data, the ROC curve and AUC are not appropriate to assess discriminatory performance. In such a case, Harrell’s C-statistic must be used and interpretation of the C-statistic is equivalent to the AUC. In addition to the C-statistic and AUC, other statistical methods such as integrated discrimination improvement or net reclassification index are also recommended when comparing multiple prediction models for the same outcome.
Example of a receiver-operating characteristic (ROC) curve. The area under the ROC curve (AUC) is equivalent to the C-statistic.
Calibration is the ability to estimate the accuracy of a model’s prediction.6 When the calibration of a prediction model is poor, the model over- or underestimates the absolute probability of the outcome event, no matter how good the discrimination of a model is. Therefore, it is important to keep the calibration value as high as possible. We have two methods of measuring the calibration of a model: statistical or graphical. The common statistical method of calculating calibration in the medical research field is the Hosmer-Lemeshow test. This method usually divides by quintile or decile and compares the percentage of both the predicted values and observed values. We can also evaluate the calibration of a prediction model by graph to compare predicated and observed values at different levels (Figure 2). In order to compare the calibrations of different prediction models, Akaike information criteria or Bayesian information criteria can be applied and lower values with either of these indexes suggest better calibration of the model.
Example of graphical calibration. The gap between the actual and predicted values should be small.
CHA2DS2-VASc score is widely used to determine the initiation of anticoagulant therapy for AF patients to prevent ischemic stroke. However, the original paper reported fairly moderate discriminatory power.7 Suzuki et al also stated that the C-statistic of the CHA2DS2-VASc score was 0.671 (95% confidence interval: 0.606–0.736) in pooled data from four Japanese registries.8 In general, a prediction model based on a mathematical equation will classify patients more accurately. Recent research shows that ATRIA, which is a more complex risk score, performs better than CHA2DS2-VASc.9,10 The ATRIA score is based on regression coefficients and the interaction term of ‘prior stroke’ is additionally considered, whereas the CHA2DS2-VASc score is not based on regression coefficients, which is technically considered an incorrect method of constructing a prediction models,11 and interaction terms are not considered.
However, physicians tend to prefer a concise model rather than a complex model even when the complex model has higher discriminatory power. Kappen et al12 point out 4 perceptual barriers to using risk prediction models in clinical practice: (1) “the predicted outcome is not the main area of attention for physicians”; (2) “the decision-making process of physicians is intuitive rather than analytical”; (3) “the probabilistic knowledge of the outcome is difficult to use in decision making”; and (4) “a prediction model does not weigh the benefits and risks of prophylactic drugs with regard to the patient’s comorbidity”. This may be why the CHA2DS2-VASc score is widely used.
Prediction Models in the Precision Medicine EraPrecision medicine is a revolutionary approach that takes into account individual differences in lifestyle, environment and biology beyond traditional personalized medicine.13 The CHA2DS2-VASc score is classified as non-sufficient for precision medicine because of its moderate discriminatory power. We must effectively use computer-based complex models such as machine learning, which is beyond our intuition, to accomplish precision medicine.14 However, prediction models only suggest the probability of an event. It is still up to us to make the decision. Thus, in the precision medicine era, physicians must understand the results of prediction models, and, using this tool, must be able to communicate these results to patients to assist them their own decision making.15