Journal of Atherosclerosis and Thrombosis
Online ISSN : 1880-3873
Print ISSN : 1340-3478
ISSN-L : 1340-3478
Original Article
Development and Validation of a Cardiovascular Disease Risk Prediction Model for the Japanese Working Population: The Japan Epidemiology Collaboration on Occupational Health Study
Huan HuTohru NakagawaToru HondaShuichiro YamamotoTakeshi KochiHiroko OkazakiToshiaki MiyamotoTakayuki OgasawaraNaoki GommoriMakoto YamamotoMaki KonishiYosuke InoueIsamu KabeSeitaro DohiTetsuya Mizoue
Author information
JOURNAL OPEN ACCESS FULL-TEXT HTML

2025 Volume 32 Issue 3 Pages 334-344

Details
Abstract

Aims: This study aimed to develop a cardiovascular disease (CVD) risk model using data from a large occupational cohort.

Methods: A risk prediction model was developed using the routine health checkup data of 96,117 Japanese employees (84.0% men) who were 30–64 years of age and had no CVD at baseline. Cox proportional hazards regression models were employed to develop a risk model for assessing the 10-year CVD risk. Measures of discrimination and calibration were used to assess the predictive performance of the model and internal validation was used to examine potential overfitting.

Results: During a mean follow-up period of 6.7 years (range, 0.1–11.0 years), 422 cases of incident CVD were confirmed. The final model, which included predictor variables of age, smoking, diabetes, systolic blood pressure, and low- and high-density lipoprotein cholesterol levels, demonstrated a good predictive ability (Harrell’s C-statistic, 0.796; 95% confidence interval, 0.775–0.817) with excellent calibration between observed and predicted values. Internal validation revealed minimal overfitting.

Conclusions: The developed model can accurately predict the 10-year CVD risk. Because it is based on routine health checkup data, the prediction model can be easily implemented in the workplace. Further studies are required to assess the external validity and transferability of the proposed CVD risk model.

Introduction

Cardiovascular disease (CVD) remains the leading cause of global mortality, accounting for 32% of all deaths in 2019 1). In Japan, more than one-fifth of premature deaths in the working population have been attributed to CVD2). Apart from the loss of life, premature deaths resulting from CVD have a considerable impact on work productivity, thereby affecting the wider economy and society3, 4). The early identification of workers at high risk for CVD is important for the prompt implementation of targeted measures to prevent or delay the disease onset in the workplace.

Over the past decades, numerous CVD risk prediction models have been developed for diverse populations worldwide5). However, the direct application of these risk models to populations other than those used in model development may lead to potential overestimation or underestimation of the risk due to variations in risk profiles6, 7). In Japan, several risk prediction models for CVD have been developed based on the data obtained for cohorts established in the 1980s and 1990s8-11). However, since then, significant changes in cardiovascular risk profiles, such as a steady decline in smoking prevalence and a marked increase in the prevalence of diabetes, have been documented in the Japanese population12-14). Moreover, the previous cohorts were primarily composed of community-dwelling middle-aged and older individuals8-11). Therefore, existing CVD risk models based on the data derived from these cohorts may have limited applicability to a contemporary, relatively young, and active working population.

Aim

In 2012, we initiated a cohort study to investigate health status and its determinants among workers through yearly health checkups, subsequently collecting data on CVD and other hard outcomes. The current study sought to develop a CVD risk model using data from this large occupational cohort with the ultimate goal of potentially applying the model in occupational health settings.

Methods

Setting

This cohort study used data from the Japan Epidemiology Collaboration on Occupational Health (J-ECOH) study, which is an ongoing multi-company study of workers in Japan. The participating companies include various industries, such as electrical machinery and apparatus manufacturing; steel, chemical, and non-ferrous metal manufacturing; automobile and instrument manufacturing; plastic product manufacturing; and healthcare. We collected annual health checkup data from January 2008 to March 2023, with the dropout of participation by a few companies during fiscal years 2014–2017. In the J-ECOH Study, the CVD registry was initiated by the participating companies in April 2012. The details of the J-ECOH study and CVD registration have been described previously15).

Before data collection, the J-ECOH Study was announced to each participating company through posters. The participants were allowed to refuse participation (opt-out) and did not provide their verbal or written informed consent. The study protocol was approved by the Ethics Committee of the National Centre for Global Health and Medicine, Japan (NCGM-G-001140).

Analytic Cohort

The current study utilized data from 11 companies, all of which had health checkup data from 2011 onwards. Among them, 8 companies provided annual health checkup data until fiscal year 2022, while 1 each provided data until fiscal years 2015, 2016, and 2017. This study employed an open cohort design, continuously enrolling participants who met the eligibility criteria between fiscal years 2011 and 2022. Participants were eligible for the study if they underwent 1 or more annual health checkups between fiscal years 2011 and 2022 and were between 30 and 64 years of age, with the initial checkup data within this period used as the baseline. Among 127,391 eligible participants, we excluded those with a history of CVD at baseline (n=2,628), those with missing data on any predictor variable (n=21,553), those who did not attend subsequent health checkups, and those who lacked information on CVD, mortality, or long-term sick leave (n=7,093). Finally, 96,117 participants (80,703 men and 15,414 women) were included in this study.

Health Checkup

The annual health checkups included anthropometric measurements, physical examinations, laboratory tests, and self-reported questionnaires covering medical history and lifestyle factors. Smoking status was assessed using a self-administered questionnaire. Blood pressure was recorded using an automatic mercury sphygmomanometer with the participant in a sitting position. Plasma glucose levels were assessed using the enzymatic or glucose oxidase peroxidative electrode method. The glycated hemoglobin (HbA1c) level was determined using a latex agglutination immunoassay, high-performance liquid chromatography, or enzymatic method. Triglyceride, low-density lipoprotein cholesterol (LDL-C), and high-density lipoprotein cholesterol (HDL-C) levels were measured using enzymatic methods. All laboratories involved in health checkups for participating companies received satisfactory scores (rank A or a score >95 out of 100) from external quality control agencies.

Predictor Variables

The variables, which were selected based on their widespread use in previous CVD risk models8-11) and their ready availability in our study, included sex, age (years), current smoking status (yes or no), diabetes status (yes or no), systolic blood pressure (mmHg), LDL-C levels (mg/dL), and HDL-C levels (mg/dL). Diabetes was defined as meeting at least 1 of the following criteria: a fasting plasma glucose level ≥ 126 mg/dL, a random plasma glucose level ≥ 200 mg/dL, an HbA1c level ≥ 6.5%, or a self-report of currently receiving medical treatment for diabetes.

Outcome

Incident CVD events, including fatal and non-fatal myocardial infarction and stroke, were ascertained from April 2012 to March 2023. For fatal cases, the cause of death was determined based on the report from the collaborating occupational physician, which included a copy of the death certificate provided by the bereaved family (54%), information gathered from the bereaved family or colleagues (16%), and additional sources or missing data (source not specified, 18%; missing, 12%). For nonfatal cases, the diagnosis of each CVD event relied on data from medical certificates issued by the treating physician and submitted to the company through the worker (87%), confirmation with the treating physician (2%), or a self-report (7%). Data were missing in 4% of cases.

Statistical Analysis

The baseline characteristics of the study participants were described as means for continuous variables and percentages for categorical variables. Person-time was calculated from March 31, 2012 (i.e., 1 day before the initiation of the CVD registration for baseline examinations in the fiscal year 2011) or from the date of the baseline examination for participants entering the study in the fiscal year 2012 or later and continued until the earliest of the following events: the date of the first CVD event, individual censoring determined based on available information (annual health checkup data, sick leave, retirement, or death), or the end of the follow-up period (typically, March 31, 2023, for most companies).

A risk prediction model was developed using a Cox proportional hazards regression analysis with a backward selection procedure to determine the predictors (P<0.05). The predictive performance was evaluated by measures of discrimination, calibration, and overall performance, as suggested by the Strengthening Analytical Thinking for Observational Studies initiative16). Two different discrimination measures were employed to assess the predictive accuracy of the model: Uno’s time-dependent area under the receiver operating characteristic curve (AUROC) calculated at the 10-year mark and Harrell’s C-statistic. Calibration was evaluated using the following 2 methods: visually, by plotting the predicted 10-year CVD risk against the observed risk in a calibration plot, and quantitatively, using both mean and weak calibrations in both fixed-time and time-range approaches. Mean and weak calibration values closer to 1 indicate better calibration. The overall performance of the model was assessed using the Brier score, which was calculated as the mean squared difference between the observed and predicted event risks at 10 years; a lower score closer to zero indicated superior performance. Internal validation was performed to estimate optimism (indicating the level of model overfitting) and correct measures of predictive performance by bootstrapping 200 samples; however, because of high computational resource demands, only 100 samples were used for calibration indices.

The multivariable prediction model was transformed into a simplified scoring system. The scoring methods are presented in Supplementary Table 1. The agreement between the 10-year CVD probability predicted by the multivariable model and the simplified score was assessed using Spearman’s rank correlation and bivariate linear regression to compare the model-predicted probability with the score-based estimate. We also attempted to create sex-specific risk prediction models; however, due to the limited number of women, we were only able to create a male-specific risk model. All statistical analyses were performed using SAS version 9.4 (SAS Institute, Cary, NC, USA). Two-sided P values of <0.05 were considered statistically significant.

Supplementary Table 1.Lipid profiles and chronic disease treatment among study participants

Without CVD Incident CVD P
N 95,695 422
Total cholesterol, mg/dL, mean (SD) 200.6 (33.4) 209.3 (36.7) <0.001
<200 mg/dL, % <0.001
200-239 mg/dL, %
≥ 240 mg/dl mg/dL, %
High-density lipoprotein cholesterol, mg/dL, mean (SD) 58.9 (15.0) 54.6 (14.9) <0.001
<40 mg/dl mg/dL, % 6.8 12.8 <0.001
40-49 mg/dl mg/dL, % 22.5 32.7
50-59 mg/dl mg/dL, % 27.8 26.1
≥ 60 mg/dl mg/dL, % 42.9 28.4
Non-high-density lipoprotein cholesterol, mg/dL, mean (SD) 142.4 (34.4) 155.4 (38.9) <0.001
<130 mg/dl mg/dL, % 37.8 25.3 <0.001
130-149 mg/dl mg/dL, % 22.8 18.9
150-169 mg/dl mg/dL, % 18.8 20.8
≥ 170 mg/dl mg/dL, % 20.6 35.0
Low-density lipoprotein cholesterol, mg/dL, mean (SD) 120.5 (29.9) 130.2 (34.0) <0.001
<100 mg/dl mg/dL, % 25.2 18.3 <0.001
100-129 mg/dl mg/dL, % 38.1 31.0
130-159 mg/dl mg/dL, % 26.3 31.0
≥ 160 mg/dl mg/dL, % 10.4 19.7
Triglycerides, mg/dL, mean (SD) 122.0 (97.5) 156.2 (129.2) <0.001
<150 mg/dl mg/dL, % 76.3 62.1 <0.001
150-199 mg/dl mg/dL, % 11.5 16.4
200-499 mg/dl mg/dL, % 11.3 19.4
≥ 500 mg/dl mg/dL, % 0.9 2.1
Hypertension, % 19.3 48.8 <0.001
Antihypertensive treatment, % 9.3 23.5 <0.001
Dyslipidaemia, % 44.8 66.1 <0.001
Lipid-lowering treatment, % 5.7 7.8 0.06
Antidiabetic treatment, % 3.3 12.1 <0.001

Data from 78,897 people were available.

Results

Table 1 presents the baseline characteristics of the participants. The mean age of the participants was 44.2 (9.4) years, and the majority of participants were men (84.0%). During a mean follow-up of 6.7 years (range, 0.1–11.0 years), with approximately 37% of the participants followed for ≥ 10 years, 422 participants developed CVD (fatal, n=79; nonfatal, n=343). The incidence rate of CVD was 0.7 per 1,000 person-years. Individuals who developed CVD were more likely to be current smokers, have diabetes and hypertension, and have higher systolic blood pressure and LDL-C levels than those who did not develop CVD. Further details regarding the lipid levels and chronic disease treatment are provided in Supplementary Table 1.

Table 1.Baseline characteristics of the study participants

Total Without CVD Incident CVD P
N 96,117 95,695 422
Age (years) 44.2 (9.4) 44.2 (9.4) 48.6 (7.4) <0.001
Men, % 84.0 83.9 90.8 <0.001
Current smoker, % 33.5 33.4 52.6 <0.001
Systolic blood pressure (mmHg) 121.2 (14.9) 121.2 (14.9) 131.7 (15.4) <0.001
Low-density lipoprotein cholesterol (mg/dL) 120.6 (29.9) 120.5 (29.9) 130.2 (34.0) <0.001
High-density lipoprotein cholesterol (mg/dL) 58.9 (15.0) 58.9 (15.0) 54.6 (14.9) <0.001
Diabetes, % 7.4 7.3 23.5 <0.001

Although sex was a potential predictor, it did not meet the inclusion criteria for variable selection and was excluded. Table 2 presents the coefficients associated with each CVD predictor. The risk of CVD was positively associated with age, smoking, systolic blood pressure, and LDL-C levels and inversely associated with HDL-C levels.

Table 2.Multivariate regression coefficients (standard errors) of the CVD risk prediction model

β (SE) Hazard ratio (95% CI) P
Age (years) 0.060 (0.006) 1.06 (1.05, 1.08) <0.001
Current smoker
No Reference Reference
Yes 0.720 (0.099) 2.05 (1.69, 2.50) <0.001
Systolic blood pressure (mmHg) 0.035 (0.003) 1.04 (1.03, 1.04) <0.001
Low-density lipoprotein cholesterol (mg/dL) 0.007 (0.002) 1.01 (1.00, 1.01) <0.001
High-density lipoprotein cholesterol (mg/dL) -0.009 (0.004) 0.99 (0.98, 1.00) 0.011
Diabetes
No Reference Reference
Yes 0.799 (0.120) 2.22 (1.76, 2.81) <0.001

Table 3 presents the discrimination, calibration, and overall performance. The model showed a good discriminative ability (apparent Harrel’s C-statistic =0.796; apparent AUROC=0.798). Furthermore, the calibration of the development data was excellent, with both mean and weak calibration values approaching or equal to 1. The calibration plot (Fig.1) indicated good agreement between the observed outcomes and predictions, with no obvious differences, except for the highest and third highest risk groups. The top 2 deciles of the predicted risk identified 53% of individuals who experienced the first CVD event during follow-up (sensitivity). The proportion of individuals without CVD events who were not in the top 2 deciles of the predicted risk was 80% (specificity). The Brier score was 0.006, indicating adequate overall model performance. Bootstrap internal validation showed little model overfitting. This was reflected in similar apparent and optimism-adjusted performance statistics (Table 3).

Table 3.Performance of the CVD risk model at 10 years

Performance measure Apparent validation Internal validation (Optimism corrected)
Discrimination
Harrel C-statistic (time range) 0.796 (0.775, 0.817) 0.794
AUROC (fixed time) 0.798 (0.780, 0.817) 0.796
Calibration
Time range
Mean calibration 1 0.999
Weak calibration (slope) 1 0.989
Fixed time
Mean calibration 1 0.835
Weak calibration (slope) 1 0.988
Overall
Brier score 0.006 (0.006, 0.007) 0.006

AUROC = area under the receiver-operating characteristic curve.

Fig.1. Calibration plot of the predicted 10-year CVD risk within deciles against the observed (Kaplan–Meier) 10-year CVD risk

The 2 lowest deciles were combined due to the small number of cases.

The risk prediction model was translated into risk scores as shown in Fig.2. The point allocation method for determining scores is presented in Supplementary Table 2. The total score ranged from 0 to 30 points. The 10-year CVD risk predicted by the risk score was well correlated with the predictions derived from the model (Spearman’s correlation coefficient r=0.966; regression coefficient β=1.018 [95% CI 1.016, 1.019]).

Fig.2.

Risk score for estimating the 10-year CVD risk

Supplementary Table 2.Determination of points for the risk score

variable Levels Median Assigned value Difference from reference (a) coefficients (b) Weight (c = ab) Point (c/0.2300)
Age 30-39 34 35 (reference) 0 0.0600 0 0
40-44 42 42 7 0.4200 2
45-49 47 47 12 0.7200 4
50-54 52 52 17 1.0200 5
55-59 57 57 22 1.3200 7
60-64 61 60 25 1.5000 8
Smoking No 0 0 (reference) 0 0.7199 0 0
Yes 1 1 1 0.7199 4
Diabetes No 0 0 (reference) 0 0.7994 0 0
Yes 1 1 1 0.7994 4
SBP <120 110 110 (reference) 0 0.0352 0 0
120-129 124 125 15 0.5280 3
130-139 134 135 25 0.8800 5
140-149 144 145 35 1.2320 6
150-159 153 155 45 1.5840 8
≥ 160 167 165 55 1.9360 10
LDL-C <140 109 110 (reference) 0 0.0067 0 0
≥ 140 155 155 45 0.3015 2
HDL-C ≥ 60 70 70 (reference) 0 -0.0095 0 0
40-59 50 50 -20 0.1900 1
<40 37 35 -35 0.3325 2

SBP, systolic blood pressure; LDL-C, low-density lipoprotein cholesterol; HDL-C, high-density lipoprotein cholesterol.

Continuous predictors were transformed into categorical variables to calculate risk scores. Age was divided into six groups, with the youngest group serving as the reference. Values of 35, 42, 47, 52, 57, and 60 were assigned to their respective age groups. Similarly, levels of SBP, HDL-C, and LDL-C were categorized based on predefined intervals. For these variables, each category was assigned the nearest multiple of 5 to its median value. Regarding other categorical variables, the healthier of the dichotomous categories was designated as the reference (assigned a value of 0), while the unhealthier category (e.g., diabetes, smoking) was assigned a value of 1.

The points for each category (j) of each predictor (i) were determined using the formula:

Point iji (W ij - W i REF ) / Constant

Here, βi represents the β estimate for the predictor i in the risk prediction model. W and W REF are the assigned values for each category and the reference category, respectively. Thus, W ij -W i REF signifies the distance of each category of each predictor from its reference category in their original units. A constant value of 0.1900 was assigned, representing the beta estimate for a 20 mg/dL decrement in HDL-C, which is the lowest value across the estimates βi (W ij - W i REF ).

The risk prediction model for men exhibited good discriminative ability and reliable calibration (Supplementary Table 3 and Supplementary Fig.1), similar to the risk prediction model for both men and women. Detailed risk scores for men are shown in Supplementary Fig.2.

Supplementary Table 3.Performance of the CVD risk model at 10 years (men only)

Performance measure Apparent validation Internal validation (Optimism corrected)
Discrimination
Harrel c-statistic (time range) 0.797 (0.776, 0.818) 0.795
AUROC (fixed time) 0.801 (0.775, 0.827) 0.795
Calibration
Time range
Mean calibration 1 0.996
Weak calibration (slope) 1 0.991
Fixed time
Mean calibration 1 0.813
Weak calibration (slope) 1 0.990
Overall
Brier score 0.007 (0.006, 0.008) 0.007

AUROC = area under the receiver-operating characteristic curve.

Supplementary Fig.1. Calibration plot of predicted 10-year CVD risk within deciles against the observed (Kaplan–Meier) 10-year CVD risk (men only)

The two lowest deciles were combined due to the small number of cases.

Supplementary Fig.2.

Risk score for estimating the 10-year CVD risk (men only)

Discussion

In this contemporary cohort study involving Japanese workers, we developed a CVD risk prediction model with good discriminatory performance and excellent calibration. To the best of our knowledge, this is the first study to develop a CVD risk model for the working population in Japan.

The proposed model includes well-established CVD risk factors such as smoking, systolic blood pressure, blood lipid levels, and diabetes. These risk factors are also commonly used in other CVD risk-prediction models. A meta-analysis revealed that 24% of models considered sex, while 69% were exclusively designed for either men or women and 7% did not include sex5). In our multivariable model, we observed that sex was not significantly associated with the risk of CVD, which is likely due to the fact that most predictors of CVD risk (e.g., smoking, blood lipid levels, and diabetes) are correlated with sex. Similarly, several previous studies have reported a non-significant association between sex and CVD, after accounting for the aforementioned traditional risk factors17, 18). Moreover, we found no evidence of interactions between sex and these risk factors, suggesting a similar relationship between these risk factors and CVD in men and women. It is noteworthy that all predictors integrated into this model can be obtained through periodic health checkups for employees in Japan. Therefore, this model can be easily implemented in the workplace to identify high-risk individuals and to introduce targeted preventive measures without additional efforts to gather other predictive information.

Our study demonstrated that the risk of CVD can be accurately predicted in the working population using data that are readily available from annual health checkups. The CVD risk prediction model showed good predictive ability, with an AUROC of approximately 0.80, which aligns with the reported ranges in previous studies among the general populations in Japan (0.78–0.81) and other countries (0.61–1.00)5, 8-11). Notably, the discrimination ability of the current model was found to be comparable to that of existing CVD risk models developed for working populations19, 20). For instance, a CVD risk model developed for a northern Italian working male population that incorporated lifestyle- and job-related conditions demonstrated an AUROC of 0.7519). In a large cohort of US male health professionals, the CVD risk model including lifestyle risk factors yielded an AUROC of 0.77 at 10 years20). In addition, both calibration indices (mean and weak calibration) suggested by the Strengthening Analytical Thinking for Observational Studies initiative and the calibration plot indicated that the model was well calibrated. Further studies are required to assess the performance of our CVD risk model in external populations.

Both national and international guidelines have documented that lifestyle modifications, such as smoking cessation, physical activity, a balanced diet, and adherence to medical treatments for conditions such as hypertension and dyslipidemia can reduce the risk of CVD21, 22). In the present study, we observed that more than one in three male workers was a current smoker, and chronic conditions such as hypertension (1 in 5 people) and dyslipidemia (1 in 2 people) were also common in the working population. However, less than half of those with these conditions are receiving medical treatment. This highlights the significant opportunity to reduce CVD risk in the working population through evidence-based interventions. The risk prediction model developed in this study can be used to identify high-risk individuals for targeted interventional studies aimed at reducing the risk of CVD.

Strengths and Limitations

The strengths of this study include the contemporaneity of the cohort and the substantial baseline sample size of the workers. Furthermore, all predictors used in our model were routinely collected during annual health checkups. However, this study has several limitations. First, the CVD registry data were mainly based on data from medical certificates written by a physician and submitted to the company by a worker. This registry primarily covers relatively severe cases as a medical certificate is mandated for long-term sick leave (≥ 2 weeks). Meanwhile, individuals with milder forms of CVD who take sick leave for less than 2 weeks are not required to submit a medical certificate, which may have resulted in an underestimation of these events in our study. Second, in approximately half of the fatal CVD cases, the cause of death was confirmed by a death certificate, while the remainder relied on less reliable sources. Given the relatively small number of fatal cases, the impact of potential inaccuracies from non-certificate causes on the overall results is likely to be limited. Most nonfatal cases were identified by treating physicians’ certificates, which are generally considered reliable despite the lack of additional validation data. Third, the relatively small number of women in our cohort may have affected the performance of our prediction models for women, despite our multivariable-adjusted model not showing a significant association between sex and CVD risk. Fourth, we did not have information regarding socioeconomic status, family history of CVD, occupational risk factors, and additional lifestyle factors such as alcohol consumption, diet, and physical activity, which could enhance the performance of our model. Fifth, the open cohort design included participants with recent data for the model coefficient estimation. However, this may have led to imprecision in the 10-year survival estimates, as not all participants provided 10 years of follow-up data. To assess this, we restricted the analysis to a closed cohort of participants enrolled between fiscal years 2011 and 2012, and found minimal differences in the baseline survival function. Finally, as our study was based on a Japanese occupational cohort, caution should be exercised when applying this model to other populations.

Conclusions

This study presents a CVD risk model that was specifically developed for practical applications in assessing the risk of CVD in workplace settings. This model can be used to identify populations with an elevated risk of CVD and assist in the design of targeted workplace-based preventive primary care interventions. Further studies are required to assess the external validity and transferability of the proposed CVD risk model.

Notice of Grant Support

This work was supported by the Industrial Health Foundation, Industrial Disease Clinical Research Grants (140202-01, 150903-01, 170301-01), JSPS KAKENHI Grants (JP25293146, JP25702006, JP16H05251, JP20H03952), and NCGM Intramural Research Fund (28-Shi-1206, 30-Shi-2003, 19A1006, 21A1020, 22A1008).

Conflict of Interest

The authors declare that they have no conflicts of interest to disclose.

References
 

This article is licensed under a Creative Commons [Attribution-NonCommercial-ShareAlike 4.0 International] license.
https://creativecommons.org/licenses/by-nc-sa/4.0/
feedback
Top