Circulation Journal
Online ISSN : 1347-4820
Print ISSN : 1346-9843
ISSN-L : 1346-9843
Epidemiology
Comparing the Consistency and Performance of Various Coronary Heart Disease Prediction Models for Primary Prevention Using a National Representative Cohort in Taiwan
Kuo-Liong ChienHung-Ju LinTa-Chen SuYun-Yu ChenPei-Chun Chen
Author information
JOURNAL FREE ACCESS FULL-TEXT HTML
Supplementary material

2018 Volume 82 Issue 7 Pages 1805-1812

Details
Abstract

Background: Predicting future coronary artery disease (CAD) risk by model-based approaches can facilitate identification of high-risk individuals for prevention and management. Therefore, we compared the consistency and performance of various CAD models for primary prevention using 1 external validation dataset from a national representative cohort in Taiwan.

Methods and Results: The 10 CAD prediction models were assessed in a validation cohort of 3559 participants (≥35 years old, 53.5% women) from a Taiwanese national representative cohort that was followed up for a median 9.70 (interquartile range, 9.63–9.74) years; 63 cases were documented as developing CAD events. The overall κ value was 0.51 for all 10 models, with a higher value for women than for men (0.53 for women, 0.40 for men). In addition, the areas under the receiver operating characteristics curves ranged from 0.804 (95% confidence interval, 0.758–0.851) to 0.847 (95% confidence interval, 0.805–0.889). All non-significant chi-square values indicated good calibration ability.

Conclusions: Our study demonstrated these 10 CAD prediction models for primary prevention were feasible and validated for use in Taiwanese subjects. Further studies of screening and management are warranted.

Prevention and control of non-communicable diseases (NCDs) has been a public health task globally, and screening and identifying individuals at high risk of coronary artery disease (CAD) is an important strategy for population-level primary prevention.1 Therefore, validated and feasible prediction models based on community cohort studies have been implemented as a tool for identifying high-risk individuals for atherosclerotic diseases.25 The Framingham risk score model has been considered a classical tool for CAD globally;6 however, this model over-estimates the incidence risk among various populations.7,8 In addition, an updated review of the current cardiovascular disease risk models for primary prevention demonstrated substantial heterogeneity for predictors and outcomes definitions, and lack of external validation importance of the models.9 Focusing on externally validating and comparing head-to-head risk models has been recommended as an important task for further model comparison.9 Moreover, how to evaluate consistency across the various models and to evaluate their comparative performance have become important issues for clinical application.10 Therefore, we compared the consistency and performance of various models for primary prevention of CAD using 1 external validation dataset from a national representative cohort in Taiwan.

Methods

Study Design and Study Participants of the Validation Cohort

The study design and participants have been described previously.1114 In brief, the study design was a population-based cohort based on a national representative study sampling; the participants were selected from Taiwan’s 2002 Triple High Survey. The original participants were randomly sampled from The National Health Interview Survey conducted by the National Health Promotion Administration during 2001 to select nationally representative samples in Taiwan. Initially, 6,550 participants were recruited during 2002 and they completed clinical and biochemical measures in standard procedures, under the supervision of the Bureau of Health Promotion. Various baseline covariates, including age, sex, systolic blood pressure, smoking, physical exercise, alcohol drinking and family history of CAD, were measured systematically in standard procedures.1115

Outcome Ascertainment Strategy of the Validation Cohort

The ascertainment strategy for CAD events was described previously.15 In brief, we linked with the claims data from the Taiwan’s National Health Insurance Research Database (NHIRD), which has enrolled approximately 23 million people, and covers nearly 99% of the Taiwanese population. This dataset contains data on utilization of all national health insurance resources, including outpatient visits, hospital care, prescribed medications, and death registry. The outcome medical diagnosis and medication history datasets were maintained and quality controlled by the Health and Welfare Data Science Center, Taiwan, to improve the diagnoses for CAD.

The International Classification of Diseases, Ninth Revision - Clinical Modification (ICD9-CM) codes were ascertained during 2000–2002 for other underlying diseases, if twice diagnosed in out-patient department records, or a single diagnosis in the in-hospital records. The selection and grouping of medications followed the guidelines of the Anatomical Therapeutic Chemical (ATC) Classification System recommended by the World Health Organization (WHO); antihypertensive medications were defined accordingly.

We excluded individuals who had a history of cardiovascular disease, including acute myocardial infarction (AMI), stroke or transient ischemic attack. We also excluded participants aged less than 18 years, or with chronic kidney disease needing renal replacement therapy, chronic liver disease, rheumatic heart disease, or valvular heart disease.

The primary endpoint was CAD events, assessed by using the NHIRD according to the ICD9-CM codes during hospitalizations from 2003 to 2011. We defined the follow-up period as the participant developing CAD or the 31 December 2011. The diagnoses of CAD (410–411), congestive heart failure (428), and AMI (410–412) were also identified. The causes of death were coded by ICD-9 during 1986 and 2007, and by ICD-10 (I20–I25) from 2008.

Description of the 10 CAD Prediction Models

The included covariates, points, coefficients and baseline risks for the 10 prediction models based on a Taiwanese community,4 Framingham risk score model,6 PROCAM model,16 Japanese cohorts including Hisayama,17 Suita,18 the Japan Public Health Center Study19 and NIPPON DATA80,20 and one Korean Heart Study21 results are listed in Table S1. Compared with coefficients-based models including the Korean Heart Study and the Japan Public Health Study, other points-based models were simple and the estimated risks were manually feasible.

Statistical Analysis

The basic characteristics of the study participants were stratified according to sex, and the continuous variables are shown as mean and standard deviation and the categorical variables are shown as number and proportion. We estimated the points and predicted incidence rates by various prediction models for the study participants.

Consistency and Reliability After categorizing specific tertiles of the incidence rates by each model, we used κ statistics to evaluate the consistency and agreement of various models. We put all 10 models together, then separated the participants into 3 groups according to tertile values. The κ statistics estimated the agreement and concordance of classifying models together.22 In addition, we performed a sensitivity analysis to detect the influence of specific models by re-calculating the κ values through deleting each specific model, and the absolute change percentage of the κ values was considered as the influence effect of the specific model, and the potential outlier effect of the model.23 We plotted a Bland-Altman plot using the average of risk as the x-axis and the difference between predicted and observed risk as the y-axis to evaluate the reliability of the model.24

Discrimination and Calibration We estimated the area under the receiver operating characteristic curve (AUC): an AUC curve is a graph of sensitivity vs. 1-specificity (or the false-positive rate) for various cutoff definitions of a positive diagnostic test result,25 and a good indicator of discrimination for model performance. We listed the sensitivity and specificity for the best cutoff values from the various models. Statistical differences in the AUCs were compared using the method of DeLong et al.26 In addition, we assessed the goodness-of-fit for all models based on the Hosmer-Lemeshow test to compare the calibration performance among these models.27

All statistical analyses were performed using SAS version 9.4 (SAS Institute, Inc., Cary, NC, USA) and STATA version 14 (Stata Corp., College Station, TX, USA).

Results

The basic characteristics of the study participants of the validation cohort according to sex are listed in Table 1. Compared with men, women were likely to be young, and to have lower blood pressures, waist circumference, triglycerides, apolipoprotein B, uric acid, creatinine, hepatic enzyme levels, and higher high-density lipoprotein cholesterol, and apolipoprotein A1 levels. Women also were more likely to have been prescribed hypertensive medications. The distributions of total and low-density lipoprotein cholesterol, fasting glucose, glycated hemoglobin level and type 2 diabetes and family history were similar between sexes.

Table 1. Basic Characteristics of the Study Participants in the Three High Cohort in Taiwan
  Men (n=1,654) Women (n=1,905) P value
Mean SD Mean SD
Age, years 52.1 12.4 51.3 11.9 0.044
Systolic BP, mmHg 122.0 17.5 117.3 19.3 <0.0001
Diastolic BP, mmHg 79.9 11.2 74.9 11.0 <0.0001
Waist circumference, cm 86.1 9.6 77.9 9.9 <0.0001
Body mass index, kg/m2 26.8 2.3 24.6 2.6 <0.0001
Total cholesterol, mg/dL 191.0 38.0 192.8 38.2 0.15
LDL-cholesterol, mg/dL 122.0 26.3 120.5 27.4 0.09
HDL-cholesterol, mg/dL 51.9 15.9 60.1 15.0 <0.0001
Triglycerides, mg/dL 153.7 99.4 121.8 73.9 <0.0001
Apolipoprotein A1, mg/dL 139.3 23.1 154.4 24.7 <0.0001
Apolipoprotein B, mg/dL 98.5 24.0 92.2 24.7 <0.0001
Uric acid, mg/dL 7.10 1.70 5.60 1.58 <0.0001
Creatinine, mg/dL 1.03 0.21 0.79 0.26 <0.0001
eGFR, mL/min/1.73 m2 83.1 17.9 84.1 16.1 0.06
Fasting glucose, mg/dL 98.3 32.4 97.3 31.6 0.34
Hemoglobin A1c, % 5.55 1.16 5.48 1.13 0.06
AST, IU/L 23.0 16.6 19.1 13.0 <0.0001
ALT, IU/L 22.0 14.9 17.7 10.7 <0.0001
  N % N %  
Antihypertensive medication 517 31.3 723 38.0 <0.0001
Type 2 diabetes 178 10.8 175 9.2 0.12
Smoking history 22 1.3 21 1.1 0.54
Family history of CAD 266 16.1 286 15.0 0.38

ALT, alanine transaminase; AST, aspartate transaminase; BP, blood pressure; CAD, coronary artery disease; eGFR, estimated glomerular filtration rate; HDL, high-density lipoprotein; LDL, low-density lipoprotein.

A total of 3,559 participants (≥35 years old, 53.5% women) from a Taiwanese national representative cohort were followed up for a median 9.70 (interquartile range, 9.63–9.74) years; among them, 63 cases (32.8% women) of CAD events were documented. The estimated points and predicted risks of the various prediction models are listed in Table 2. Although the ranges of the estimated points varied, the mean predicted risks from the various models were consistent: ranging from 0.004 for NIPPON DATA80 to 0.054 for the Hisayama model, and the overall average predicted risk was 0.025, both of which were similar to the clinical (0.025) and total cholesterol (0.021) models, and the Suita (0.020) and Japanese Public Health Center Study (0.022) models. We found that the Japanese Hisayama model had the highest predicted risk (mean 0.054), compared with the other models, which may be attributed to the combined coronary and stroke events.

Table 2. Estimated Points and Predicted Risks of Various Prediction Models for the Study Participants
  Mean SD Min. Max. Q1 Median Q3
Estimated points for various models
 Clinical model 7.0 3.5 1 17 4 7 9
 Total cholesterol model 10.8 4.2 2 23 8 11 14
 LDL-cholesterol model 10.0 4.2 1 23 7 9 13
 Framingham chart 8.1 6.0 −5 26 4 8 13
 PROCAM chart 29.4 16.5 0 84 16 27 41
 Japanese Hisayama model 3.6 2.9 0 14 1 3 5
 Japanese Suita model 33.6 13.6 10 82 23 32 43
 Japan Public Health Center Study 35.8 1.8 31 41 34 36 37
 Korean Heart Study 0.2 1.2 −2 4 −1 0 1
Predicted risks for various models
 Clinical model 0.025 0.030 0.003 0.240 0.006 0.015 0.026
 Total cholesterol model 0.021 0.029 0.001 0.280 0.005 0.011 0.026
 LDL-cholesterol model 0.017 0.029 0.001 0.276 0.004 0.006 0.019
 Framingham chart 0.043 0.059 0.005 0.300 0.005 0.010 0.060
 PROCAM chart 0.034 0.072 0.000 0.600 0.002 0.008 0.029
 Japanese Hisayama model 0.054 0.056 0.014 0.300 0.018 0.032 0.056
 Japanese Suita model 0.020 0.033 0.005 0.280 0.005 0.005 0.020
 Japanese NIPPON80 data 0.004 0.003 0.003 0.035 0.003 0.003 0.008
 Japan Public Health Center Study 0.022 0.041 0.000 0.598 0.002 0.007 0.023
 Korean Heart Study 0.009 0.014 0.000 0.178 0.002 0.004 0.011

Overall predicted risk average was 0.025. LDL, low-density lipoprotein.

Stratifying all participants into tertiles according to their risk status, we found the risk increased as the tertiles progressed (Figure 1). Among the models, NIPPON DATA80 seemed to provide a substantially higher risk estimate for the tertiles, indicating a discrepancy for prediction.

Figure 1.

Incidence rates of coronary artery disease events in the validation data, according to the tertiles of the risk models. LDL, low-density lipoprotein.

The consistency measures by κ statistics are shown in Table 3: the overall κ value was 0.51 for all 10 models, with a higher value for women than for men (0.53 vs. 0.40, respectively). The sensitivity analysis by the percentage change in κ value by deleting 1 model showed that after deleting the NIPPON DATA80 chart, the κ values increased in both sexes to 0.61 (0.52 for men, 0.61 for women), with the largest percentage κ change (19.3% for both sexes; 30.6% for men, 14.3% for women), indicating the NIPPON DATA80 chart was not consistent with the other models. In addition, the clinical and low-density lipoprotein cholesterol and Hisayama as well as Suita models had the lowest absolute κ changes, indicating good consistency among these models.

Table 3. Inter-Rater Agreement of Different Criteria of Metabolic Syndrome by κ Values, Each Cell Presenting the κ When Deleting the Respective Model
Model All Men Women
κ % absolute change κ % absolute change κ % absolute change
All 10 models 0.51 0.40 0.53
 -Clinical model 0.50 1.5 0.40 1.5 0.52 1.3
 -Total cholesterol model 0.50 2.2 0.39 2.1 0.52 1.4
 -LDL-cholesterol model 0.51 0.5 0.39 3.4 0.53 0.2
 -Framingham chart 0.49 3.6 0.39 4.1 0.51 3.1
 -PROCAM chart 0.49 4.4 0.38 6.4 0.51 3.4
 -Hisayama model 0.50 1.6 0.38 6.0 0.53 0.5
 -Suita model 0.52 2.2 0.42 3.4 0.54 1.1
 -Korean Heart Study 0.48 4.8 0.37 6.8 0.51 4.4
 -Japan Public Health Center Study 0.49 3.7 0.39 3.7 0.51 3.2
 -NIPPON80 data 0.61 19.3 0.52 30.6 0.61 14.3

LDL, low-density lipoprotein.

The AUC values by the various prediction models are listed in Table 4, ranging from 0.804 (95% confidence interval [CI], 0.758–0.851) for the total cholesterol model to 0.847 (95% CI, 0.805–0.889) for the Korean Heart Study model. In contrast, the NIPPON DATA80 chart had the lowest AUC curve (0.691, 95% CI, 0.628–0.754) for predicting risk. Figure 2 shows the various ROC curves of the prediction models for the study participants.

Table 4. Area Under the ROC Curves for Various Prediction Models, Applied to the Study Participants
  AUC 95% CI
By estimated points
 Clinical model 0.815 0.770–0.860
 Total cholesterol model 0.804 0.758–0.851
 LDL-cholesterol model 0.813 0.771–0.855
 Framingham chart 0.805 0.763–0.846
 PROCAM chart 0.831 0.791–0.871
 Hisayama model 0.826 0.784–0.868
 Suita model 0.850 0.809–0.891
 Japan Public Health Center Study 0.835 0.789–0.881
 Korean Heart Study 0.817 0.774–0.860
By predicted risks
 Clinical model 0.815 0.770–0.860
 Total cholesterol model 0.804 0.758–0.851
 LDL-cholesterol model 0.813 0.771–0.855
 Framingham chart 0.823 0.775–0.870
 PROCAM chart 0.832 0.790–0.874
 Hisayama model 0.826 0.784–0.868
 Suita model 0.844 0.799–0.889
 Japan Public Health Center Study 0.835 0.789–0.881
 NIPPON80 DATA model 0.691 0.628–0.754
 Korean Heart Study 0.847 0.805–0.889

AUC, area under the curve; CI, confidence interval; LDL, low-density lipoprotein; ROC, receiver-operating characteristic.

Figure 2.

Receiver-operating characteristic (ROC) curves of the predicted risk models for the study participants. JPHC, Japan Public Health Center; LDL, low-density lipoprotein.

The calibration performance measures by the Hosmer-Lemeshow statistics (Table 5) showed that the Suita model had the lowest chi-square value (5.2), followed by the clinical model (6.2), total cholesterol model (6.3) and the Hisayama model (6.6), implying the Suita model was the best goodness-of-fit model for the CAD outcome. In addition, the Korean Heart Study model had the highest chi-square value (13.3), not reaching a significant level. The non-significant P values indicated a good calibration ability of these models. The observed and expected numbers for the Suita model and the Korean Heart Study model are plotted in Figure 3, showing a slight underestimate of the predicted risks. The Bland-Altman analysis showed that the reliability of the predicted and observed numbers in the Suita model was the best, with a near zero slope and lowest R-square value. In addition, the point-based model underestimated the observed numbers (the slope value was −0.086) and the Korean Heart model overestimated the observed numbers (the slope was 0.029) (Figure S1).

Table 5. Hosmer-Lemeshow Statistics for Goodness-of-Fit Testing of Various Point-Based Models
Point-based Chi-square d.f. P value
Clinical model 6.2 8 0.63
Total cholesterol model 6.3 10 0.79
LDL cholesterol model 10.2 10 0.42
Framingham chart 9.9 7 0.19
PROCAM chart 11.3 8 0.19
Hisayama model 6.6 6 0.36
Suita model 5.2 8 0.74
Japan Public Health Center Study 7.6 8 0.47
Korean Heart Study 13.3 8 0.10

ALT, alanine transaminase; AST, aspartate transaminase; BP, blood pressure; CAD, coronary artery disease; eGFR, estimated glomerular filtration rate; HDL, high-density lipoprotein; LDL, low-density lipoprotein.

Figure 3.

Observed and expected numbers of the study participants for the Suita model (Upper) and the Korean Heart model (Lower).

Discussion

Major Findings

Our study clearly demonstrated that the various CAD prediction models were consistent and performed well, and the discriminative ability and calibration of the various models were similar in a national representative cohort of Taiwanese.

Comparison With Other Studies

In the Chinese Multi-Provincial Cohort study comprising 30,121 Chinese adults aged 35–64 years for follow-up during 1992 and 2002, Liu et al examined the Framingham model and found that it overestimated the risk for Chinese, and only through recalibration by mean values of risk factors and CAD incidence rates was the performance of the Framingham model improved.28 Because of the Framingham model’s overestimation problem, other strategies, such as recalibration, were proposed.29 Although the recalibration technique improved the goodness-of-fit for the estimated model, the problem of potential misclassification may make its application difficult.30

Some important factors, such as ethnicity, socioeconomic status and family history, are absent from the Framingham equations, and recalibration and adjustment did not overcome the overestimation problem well.31 In addition, some biomarkers, such as non-high-density lipoprotein cholesterol and apolipoproteins, were incorporated into the Canadian Cardiovascular Society guideline,32 and a family history of premature CAD and chronic kidney status should be considered as the risk factor for dyslipidemia treatment and primary prevention.32

Our study demonstrated consistent calibration abilities of the various prediction models for a Taiwanese adult population, with a goodness-of-fit value, and these models provided sufficient discrimination performance by AUC values >0.80. Among these 10 models, we observed that NIPPON DATA80 had a substantial change in κ value that may have been caused by the stratified chat scoring approach and the NIPPON DATA80 chart predicting the risk of coronary death. In addition, the Hisayama study included coronary and stroke risk, overestimating the predicted risk. The Suita model had the best discrimination and calibration measure (higher area under ROC curve and lowest Hosmer-Lemeshow chi-square value), which may be attributed to similar covariates in the model. We found that the Korean Heart Study performed well for calibration measures because of sex-specific risk prediction. The Framingham model is known to overestimate the coronary risk in Asian-Pacific countries because of the relative lower risk in this region. Our findings showed a high consistency between the predicted and observed numbers in the Suita model by Bland-Altman plot, which indicated good performance among the Taiwanese adults.

Ethnicity Difference

Asian populations showed a great burden of hypertension, lower high-density lipoprotein cholesterol, and higher body fat in a normal body mass index according to a general survey in the USA,33 and the heterogeneity within Asian populations may be related to the regional CAD occurrence patterns.

A brief review by Dr. Kokubo showed that several lifestyle factors in Westerners and East Asian countries differ; especially, sodium intake, dietary patterns, physical activity and smoking, obesity and drinking habits.34 Therefore, emphasizing locally applicable prediction models in Asian-Pacific countries is an important task in the primary prevention of cardiovascular disease.

Clinical and Public Health Implications

Our clinical and cholesterol-based models, which were derived from community-based cohort data,4 performed well, compared with the Japanese and Korean models, indicating similar risk profiles among these Asian-Pacific countries. The Framingham model and PROCAM chart model did not outperform the prediction among Taiwanese adults, compared with the other models. This variation in model performance may be attributed to the heterogeneity of population characteristics and the baseline risk for each model.9,10

Other available models, such as the QRISK and QRISK2 models for a UK cohort, are useful tools for predicting CAD.35,36 The QRISK2-2011 version was developed for better discrimination and calibration capability using an extensive risk profile, including self-assigned ethnicity, age, sex, smoking, systolic blood pressure, ratio of total to high-density lipoprotein cholesterol, body mass index, family history of premature CAD, Townsend deprivation score, hypertension medication, type 2 diabetes, renal disease, atrial fibrillation and rheumatoid arthritis history.37 However, such an extensive risk profile assessment may be a burden in a primary prevention screening program.

The SCORE (Systematic Coronary Risk Evaluation) scale model is applied to assess fatal cardiovascular disease events, stratified into high-risk and low-risk countries; however, its feasibility was limited among contemporary Czech and Polish adults.38

Study Strengths and Limitations

This study has several strengths. First, the extensive follow-up strategy based on the NHIRD provided a validated outcome ascertainment, and the potential for misclassification of diagnosis was reduced. Second, detailed clinical information was available for constructing prediction models. Third, we used κ statistics and calibration measures for assessing the reliability and consistency performance of these models, and the sensitivity analysis by deleting 1 model provided insights for the difference of each model. However, there were several limitations. First, only 1-shot baseline clinical information was available and no time-dependent covariates were included. Second, the relatively low risk of CAD events among the cohort may have reduced the power of the models, although we provided substantial person-year follow-up. Finally, details of co-medications and specific high-risk individuals such as those with chronic kidney disease and diabetes mellitus were not specified in the analysis in this study.

Conclusions

Our study demonstrated various prediction models provided a consistent and validated tool for predicting CAD risk among adult Taiwanese. Further intervention trials to evaluate the efficacy of these models in the general population are warranted.

Sources of Funding

This study was supported by the grants from Ministry of Science and Technology, Taiwan (MOST 106-2314-B-002-158-MY3, MOST 106-3114-B-002-001-, MOST 103-2314-B-002-135-MY3) and National Taiwan University Hospital (105-S3120, 106-S3453).

Supplementary Files

Supplementary File 1

Figure S1. Bland-Altman plots of the predicted and observed numbers (the average value as x-axis, the difference between predicted and observed values as y-axis of selected prediction models.

Table S1. Summary of 10 CAD prediction model components, including points, coefficients and related covariates

Please find supplementary file(s);

http://dx.doi.org/10.1253/circj.CJ-17-0910

References
 
© 2018 THE JAPANESE CIRCULATION SOCIETY
feedback
Top