2020 年 41 巻 1 号 p. 1-35
Prediction models are usually developed through model-construction and validation. Especially for binary or time-to-event outcomes, the risk prediction models should be evaluated through several aspects of the accuracy of prediction. With unified algebraic notation, we present such evaluation measures for model validation from five statistical viewpoints that are frequently reported in medical literature: 1) Brier score for prediction error; 2) sensitivity, specificity, and C-index for discrimination; 3) calibration-in-the-large, calibration slope, and Hosmer-Lemeshow statistic for calibration; 4) net reclassification and integrated discrimination improvement indexes for reclassification; and 5) net benefit for clinical usefulness. Graphical representation such as a receiver operating characteristic curve, a calibration plot, or a decision curve helps researchers interpret these evaluation measures. The interrelationship between them is discussed, and their definitions and estimators are extended to time-to-event data suffering from outcome-censoring. We illustrate their calculation through example datasets with the SAS codes provided in the web appendix.