2025 Volume 32 Issue 3 Pages 334-344
Aims: This study aimed to develop a cardiovascular disease (CVD) risk model using data from a large occupational cohort.
Methods: A risk prediction model was developed using the routine health checkup data of 96,117 Japanese employees (84.0% men) who were 30–64 years of age and had no CVD at baseline. Cox proportional hazards regression models were employed to develop a risk model for assessing the 10-year CVD risk. Measures of discrimination and calibration were used to assess the predictive performance of the model and internal validation was used to examine potential overfitting.
Results: During a mean follow-up period of 6.7 years (range, 0.1–11.0 years), 422 cases of incident CVD were confirmed. The final model, which included predictor variables of age, smoking, diabetes, systolic blood pressure, and low- and high-density lipoprotein cholesterol levels, demonstrated a good predictive ability (Harrell’s C-statistic, 0.796; 95% confidence interval, 0.775–0.817) with excellent calibration between observed and predicted values. Internal validation revealed minimal overfitting.
Conclusions: The developed model can accurately predict the 10-year CVD risk. Because it is based on routine health checkup data, the prediction model can be easily implemented in the workplace. Further studies are required to assess the external validity and transferability of the proposed CVD risk model.
Cardiovascular disease (CVD) remains the leading cause of global mortality, accounting for 32% of all deaths in 2019 1). In Japan, more than one-fifth of premature deaths in the working population have been attributed to CVD2). Apart from the loss of life, premature deaths resulting from CVD have a considerable impact on work productivity, thereby affecting the wider economy and society3, 4). The early identification of workers at high risk for CVD is important for the prompt implementation of targeted measures to prevent or delay the disease onset in the workplace.
Over the past decades, numerous CVD risk prediction models have been developed for diverse populations worldwide5). However, the direct application of these risk models to populations other than those used in model development may lead to potential overestimation or underestimation of the risk due to variations in risk profiles6, 7). In Japan, several risk prediction models for CVD have been developed based on the data obtained for cohorts established in the 1980s and 1990s8-11). However, since then, significant changes in cardiovascular risk profiles, such as a steady decline in smoking prevalence and a marked increase in the prevalence of diabetes, have been documented in the Japanese population12-14). Moreover, the previous cohorts were primarily composed of community-dwelling middle-aged and older individuals8-11). Therefore, existing CVD risk models based on the data derived from these cohorts may have limited applicability to a contemporary, relatively young, and active working population.
In 2012, we initiated a cohort study to investigate health status and its determinants among workers through yearly health checkups, subsequently collecting data on CVD and other hard outcomes. The current study sought to develop a CVD risk model using data from this large occupational cohort with the ultimate goal of potentially applying the model in occupational health settings.
This cohort study used data from the Japan Epidemiology Collaboration on Occupational Health (J-ECOH) study, which is an ongoing multi-company study of workers in Japan. The participating companies include various industries, such as electrical machinery and apparatus manufacturing; steel, chemical, and non-ferrous metal manufacturing; automobile and instrument manufacturing; plastic product manufacturing; and healthcare. We collected annual health checkup data from January 2008 to March 2023, with the dropout of participation by a few companies during fiscal years 2014–2017. In the J-ECOH Study, the CVD registry was initiated by the participating companies in April 2012. The details of the J-ECOH study and CVD registration have been described previously15).
Before data collection, the J-ECOH Study was announced to each participating company through posters. The participants were allowed to refuse participation (opt-out) and did not provide their verbal or written informed consent. The study protocol was approved by the Ethics Committee of the National Centre for Global Health and Medicine, Japan (NCGM-G-001140).
Analytic CohortThe current study utilized data from 11 companies, all of which had health checkup data from 2011 onwards. Among them, 8 companies provided annual health checkup data until fiscal year 2022, while 1 each provided data until fiscal years 2015, 2016, and 2017. This study employed an open cohort design, continuously enrolling participants who met the eligibility criteria between fiscal years 2011 and 2022. Participants were eligible for the study if they underwent 1 or more annual health checkups between fiscal years 2011 and 2022 and were between 30 and 64 years of age, with the initial checkup data within this period used as the baseline. Among 127,391 eligible participants, we excluded those with a history of CVD at baseline (n=2,628), those with missing data on any predictor variable (n=21,553), those who did not attend subsequent health checkups, and those who lacked information on CVD, mortality, or long-term sick leave (n=7,093). Finally, 96,117 participants (80,703 men and 15,414 women) were included in this study.
Health CheckupThe annual health checkups included anthropometric measurements, physical examinations, laboratory tests, and self-reported questionnaires covering medical history and lifestyle factors. Smoking status was assessed using a self-administered questionnaire. Blood pressure was recorded using an automatic mercury sphygmomanometer with the participant in a sitting position. Plasma glucose levels were assessed using the enzymatic or glucose oxidase peroxidative electrode method. The glycated hemoglobin (HbA1c) level was determined using a latex agglutination immunoassay, high-performance liquid chromatography, or enzymatic method. Triglyceride, low-density lipoprotein cholesterol (LDL-C), and high-density lipoprotein cholesterol (HDL-C) levels were measured using enzymatic methods. All laboratories involved in health checkups for participating companies received satisfactory scores (rank A or a score >95 out of 100) from external quality control agencies.
Predictor VariablesThe variables, which were selected based on their widespread use in previous CVD risk models8-11) and their ready availability in our study, included sex, age (years), current smoking status (yes or no), diabetes status (yes or no), systolic blood pressure (mmHg), LDL-C levels (mg/dL), and HDL-C levels (mg/dL). Diabetes was defined as meeting at least 1 of the following criteria: a fasting plasma glucose level ≥ 126 mg/dL, a random plasma glucose level ≥ 200 mg/dL, an HbA1c level ≥ 6.5%, or a self-report of currently receiving medical treatment for diabetes.
OutcomeIncident CVD events, including fatal and non-fatal myocardial infarction and stroke, were ascertained from April 2012 to March 2023. For fatal cases, the cause of death was determined based on the report from the collaborating occupational physician, which included a copy of the death certificate provided by the bereaved family (54%), information gathered from the bereaved family or colleagues (16%), and additional sources or missing data (source not specified, 18%; missing, 12%). For nonfatal cases, the diagnosis of each CVD event relied on data from medical certificates issued by the treating physician and submitted to the company through the worker (87%), confirmation with the treating physician (2%), or a self-report (7%). Data were missing in 4% of cases.
Statistical AnalysisThe baseline characteristics of the study participants were described as means for continuous variables and percentages for categorical variables. Person-time was calculated from March 31, 2012 (i.e., 1 day before the initiation of the CVD registration for baseline examinations in the fiscal year 2011) or from the date of the baseline examination for participants entering the study in the fiscal year 2012 or later and continued until the earliest of the following events: the date of the first CVD event, individual censoring determined based on available information (annual health checkup data, sick leave, retirement, or death), or the end of the follow-up period (typically, March 31, 2023, for most companies).
A risk prediction model was developed using a Cox proportional hazards regression analysis with a backward selection procedure to determine the predictors (P<0.05). The predictive performance was evaluated by measures of discrimination, calibration, and overall performance, as suggested by the Strengthening Analytical Thinking for Observational Studies initiative16). Two different discrimination measures were employed to assess the predictive accuracy of the model: Uno’s time-dependent area under the receiver operating characteristic curve (AUROC) calculated at the 10-year mark and Harrell’s C-statistic. Calibration was evaluated using the following 2 methods: visually, by plotting the predicted 10-year CVD risk against the observed risk in a calibration plot, and quantitatively, using both mean and weak calibrations in both fixed-time and time-range approaches. Mean and weak calibration values closer to 1 indicate better calibration. The overall performance of the model was assessed using the Brier score, which was calculated as the mean squared difference between the observed and predicted event risks at 10 years; a lower score closer to zero indicated superior performance. Internal validation was performed to estimate optimism (indicating the level of model overfitting) and correct measures of predictive performance by bootstrapping 200 samples; however, because of high computational resource demands, only 100 samples were used for calibration indices.
The multivariable prediction model was transformed into a simplified scoring system. The scoring methods are presented in Supplementary Table 1. The agreement between the 10-year CVD probability predicted by the multivariable model and the simplified score was assessed using Spearman’s rank correlation and bivariate linear regression to compare the model-predicted probability with the score-based estimate. We also attempted to create sex-specific risk prediction models; however, due to the limited number of women, we were only able to create a male-specific risk model. All statistical analyses were performed using SAS version 9.4 (SAS Institute, Cary, NC, USA). Two-sided P values of <0.05 were considered statistically significant.
Without CVD | Incident CVD | P | |
---|---|---|---|
N | 95,695 | 422 | |
Total cholesterol, mg/dL, mean (SD) * | 200.6 (33.4) | 209.3 (36.7) | <0.001 |
<200 mg/dL, % | <0.001 | ||
200-239 mg/dL, % | |||
≥ 240 mg/dl mg/dL, % | |||
High-density lipoprotein cholesterol, mg/dL, mean (SD) | 58.9 (15.0) | 54.6 (14.9) | <0.001 |
<40 mg/dl mg/dL, % | 6.8 | 12.8 | <0.001 |
40-49 mg/dl mg/dL, % | 22.5 | 32.7 | |
50-59 mg/dl mg/dL, % | 27.8 | 26.1 | |
≥ 60 mg/dl mg/dL, % | 42.9 | 28.4 | |
Non-high-density lipoprotein cholesterol, mg/dL, mean (SD) * | 142.4 (34.4) | 155.4 (38.9) | <0.001 |
<130 mg/dl mg/dL, % | 37.8 | 25.3 | <0.001 |
130-149 mg/dl mg/dL, % | 22.8 | 18.9 | |
150-169 mg/dl mg/dL, % | 18.8 | 20.8 | |
≥ 170 mg/dl mg/dL, % | 20.6 | 35.0 | |
Low-density lipoprotein cholesterol, mg/dL, mean (SD) | 120.5 (29.9) | 130.2 (34.0) | <0.001 |
<100 mg/dl mg/dL, % | 25.2 | 18.3 | <0.001 |
100-129 mg/dl mg/dL, % | 38.1 | 31.0 | |
130-159 mg/dl mg/dL, % | 26.3 | 31.0 | |
≥ 160 mg/dl mg/dL, % | 10.4 | 19.7 | |
Triglycerides, mg/dL, mean (SD) | 122.0 (97.5) | 156.2 (129.2) | <0.001 |
<150 mg/dl mg/dL, % | 76.3 | 62.1 | <0.001 |
150-199 mg/dl mg/dL, % | 11.5 | 16.4 | |
200-499 mg/dl mg/dL, % | 11.3 | 19.4 | |
≥ 500 mg/dl mg/dL, % | 0.9 | 2.1 | |
Hypertension, % | 19.3 | 48.8 | <0.001 |
Antihypertensive treatment, % | 9.3 | 23.5 | <0.001 |
Dyslipidaemia, % | 44.8 | 66.1 | <0.001 |
Lipid-lowering treatment, % | 5.7 | 7.8 | 0.06 |
Antidiabetic treatment, % | 3.3 | 12.1 | <0.001 |
*Data from 78,897 people were available.
Table 1 presents the baseline characteristics of the participants. The mean age of the participants was 44.2 (9.4) years, and the majority of participants were men (84.0%). During a mean follow-up of 6.7 years (range, 0.1–11.0 years), with approximately 37% of the participants followed for ≥ 10 years, 422 participants developed CVD (fatal, n=79; nonfatal, n=343). The incidence rate of CVD was 0.7 per 1,000 person-years. Individuals who developed CVD were more likely to be current smokers, have diabetes and hypertension, and have higher systolic blood pressure and LDL-C levels than those who did not develop CVD. Further details regarding the lipid levels and chronic disease treatment are provided in Supplementary Table 1.
Total | Without CVD | Incident CVD | P | |
---|---|---|---|---|
N | 96,117 | 95,695 | 422 | |
Age (years) | 44.2 (9.4) | 44.2 (9.4) | 48.6 (7.4) | <0.001 |
Men, % | 84.0 | 83.9 | 90.8 | <0.001 |
Current smoker, % | 33.5 | 33.4 | 52.6 | <0.001 |
Systolic blood pressure (mmHg) | 121.2 (14.9) | 121.2 (14.9) | 131.7 (15.4) | <0.001 |
Low-density lipoprotein cholesterol (mg/dL) | 120.6 (29.9) | 120.5 (29.9) | 130.2 (34.0) | <0.001 |
High-density lipoprotein cholesterol (mg/dL) | 58.9 (15.0) | 58.9 (15.0) | 54.6 (14.9) | <0.001 |
Diabetes, % | 7.4 | 7.3 | 23.5 | <0.001 |
Although sex was a potential predictor, it did not meet the inclusion criteria for variable selection and was excluded. Table 2 presents the coefficients associated with each CVD predictor. The risk of CVD was positively associated with age, smoking, systolic blood pressure, and LDL-C levels and inversely associated with HDL-C levels.
β (SE) | Hazard ratio (95% CI) | P | |
---|---|---|---|
Age (years) | 0.060 (0.006) | 1.06 (1.05, 1.08) | <0.001 |
Current smoker | |||
No | Reference | Reference | |
Yes | 0.720 (0.099) | 2.05 (1.69, 2.50) | <0.001 |
Systolic blood pressure (mmHg) | 0.035 (0.003) | 1.04 (1.03, 1.04) | <0.001 |
Low-density lipoprotein cholesterol (mg/dL) | 0.007 (0.002) | 1.01 (1.00, 1.01) | <0.001 |
High-density lipoprotein cholesterol (mg/dL) | -0.009 (0.004) | 0.99 (0.98, 1.00) | 0.011 |
Diabetes | |||
No | Reference | Reference | |
Yes | 0.799 (0.120) | 2.22 (1.76, 2.81) | <0.001 |
Table 3 presents the discrimination, calibration, and overall performance. The model showed a good discriminative ability (apparent Harrel’s C-statistic =0.796; apparent AUROC=0.798). Furthermore, the calibration of the development data was excellent, with both mean and weak calibration values approaching or equal to 1. The calibration plot (Fig.1) indicated good agreement between the observed outcomes and predictions, with no obvious differences, except for the highest and third highest risk groups. The top 2 deciles of the predicted risk identified 53% of individuals who experienced the first CVD event during follow-up (sensitivity). The proportion of individuals without CVD events who were not in the top 2 deciles of the predicted risk was 80% (specificity). The Brier score was 0.006, indicating adequate overall model performance. Bootstrap internal validation showed little model overfitting. This was reflected in similar apparent and optimism-adjusted performance statistics (Table 3).
Performance measure | Apparent validation | Internal validation (Optimism corrected) |
---|---|---|
Discrimination | ||
Harrel C-statistic (time range) | 0.796 (0.775, 0.817) | 0.794 |
AUROC (fixed time) | 0.798 (0.780, 0.817) | 0.796 |
Calibration | ||
Time range | ||
Mean calibration | 1 | 0.999 |
Weak calibration (slope) | 1 | 0.989 |
Fixed time | ||
Mean calibration | 1 | 0.835 |
Weak calibration (slope) | 1 | 0.988 |
Overall | ||
Brier score | 0.006 (0.006, 0.007) | 0.006 |
AUROC = area under the receiver-operating characteristic curve.
The 2 lowest deciles were combined due to the small number of cases.
The risk prediction model was translated into risk scores as shown in Fig.2. The point allocation method for determining scores is presented in Supplementary Table 2. The total score ranged from 0 to 30 points. The 10-year CVD risk predicted by the risk score was well correlated with the predictions derived from the model (Spearman’s correlation coefficient r=0.966; regression coefficient β=1.018 [95% CI 1.016, 1.019]).
Risk score for estimating the 10-year CVD risk
variable | Levels | Median | Assigned value | Difference from reference (a) | coefficients (b) | Weight (c = a*b) | Point (c/0.2300) |
---|---|---|---|---|---|---|---|
Age | 30-39 | 34 | 35 (reference) | 0 | 0.0600 | 0 | 0 |
40-44 | 42 | 42 | 7 | 0.4200 | 2 | ||
45-49 | 47 | 47 | 12 | 0.7200 | 4 | ||
50-54 | 52 | 52 | 17 | 1.0200 | 5 | ||
55-59 | 57 | 57 | 22 | 1.3200 | 7 | ||
60-64 | 61 | 60 | 25 | 1.5000 | 8 | ||
Smoking | No | 0 | 0 (reference) | 0 | 0.7199 | 0 | 0 |
Yes | 1 | 1 | 1 | 0.7199 | 4 | ||
Diabetes | No | 0 | 0 (reference) | 0 | 0.7994 | 0 | 0 |
Yes | 1 | 1 | 1 | 0.7994 | 4 | ||
SBP | <120 | 110 | 110 (reference) | 0 | 0.0352 | 0 | 0 |
120-129 | 124 | 125 | 15 | 0.5280 | 3 | ||
130-139 | 134 | 135 | 25 | 0.8800 | 5 | ||
140-149 | 144 | 145 | 35 | 1.2320 | 6 | ||
150-159 | 153 | 155 | 45 | 1.5840 | 8 | ||
≥ 160 | 167 | 165 | 55 | 1.9360 | 10 | ||
LDL-C | <140 | 109 | 110 (reference) | 0 | 0.0067 | 0 | 0 |
≥ 140 | 155 | 155 | 45 | 0.3015 | 2 | ||
HDL-C | ≥ 60 | 70 | 70 (reference) | 0 | -0.0095 | 0 | 0 |
40-59 | 50 | 50 | -20 | 0.1900 | 1 | ||
<40 | 37 | 35 | -35 | 0.3325 | 2 |
SBP, systolic blood pressure; LDL-C, low-density lipoprotein cholesterol; HDL-C, high-density lipoprotein cholesterol.
Continuous predictors were transformed into categorical variables to calculate risk scores. Age was divided into six groups, with the youngest group serving as the reference. Values of 35, 42, 47, 52, 57, and 60 were assigned to their respective age groups. Similarly, levels of SBP, HDL-C, and LDL-C were categorized based on predefined intervals. For these variables, each category was assigned the nearest multiple of 5 to its median value. Regarding other categorical variables, the healthier of the dichotomous categories was designated as the reference (assigned a value of 0), while the unhealthier category (e.g., diabetes, smoking) was assigned a value of 1.
The points for each category (j) of each predictor (i) were determined using the formula:
Point ij =βi (W ij - W i REF ) / Constant
Here, βi represents the β estimate for the predictor i in the risk prediction model. W and W REF are the assigned values for each category and the reference category, respectively. Thus, W ij -W i REF signifies the distance of each category of each predictor from its reference category in their original units. A constant value of 0.1900 was assigned, representing the beta estimate for a 20 mg/dL decrement in HDL-C, which is the lowest value across the estimates βi (W ij - W i REF ).
The risk prediction model for men exhibited good discriminative ability and reliable calibration (Supplementary Table 3 and Supplementary Fig.1), similar to the risk prediction model for both men and women. Detailed risk scores for men are shown in Supplementary Fig.2.
Performance measure | Apparent validation | Internal validation (Optimism corrected) |
---|---|---|
Discrimination | ||
Harrel c-statistic (time range) | 0.797 (0.776, 0.818) | 0.795 |
AUROC (fixed time) | 0.801 (0.775, 0.827) | 0.795 |
Calibration | ||
Time range | ||
Mean calibration | 1 | 0.996 |
Weak calibration (slope) | 1 | 0.991 |
Fixed time | ||
Mean calibration | 1 | 0.813 |
Weak calibration (slope) | 1 | 0.990 |
Overall | ||
Brier score | 0.007 (0.006, 0.008) | 0.007 |
AUROC = area under the receiver-operating characteristic curve.
The two lowest deciles were combined due to the small number of cases.
Risk score for estimating the 10-year CVD risk (men only)
In this contemporary cohort study involving Japanese workers, we developed a CVD risk prediction model with good discriminatory performance and excellent calibration. To the best of our knowledge, this is the first study to develop a CVD risk model for the working population in Japan.
The proposed model includes well-established CVD risk factors such as smoking, systolic blood pressure, blood lipid levels, and diabetes. These risk factors are also commonly used in other CVD risk-prediction models. A meta-analysis revealed that 24% of models considered sex, while 69% were exclusively designed for either men or women and 7% did not include sex5). In our multivariable model, we observed that sex was not significantly associated with the risk of CVD, which is likely due to the fact that most predictors of CVD risk (e.g., smoking, blood lipid levels, and diabetes) are correlated with sex. Similarly, several previous studies have reported a non-significant association between sex and CVD, after accounting for the aforementioned traditional risk factors17, 18). Moreover, we found no evidence of interactions between sex and these risk factors, suggesting a similar relationship between these risk factors and CVD in men and women. It is noteworthy that all predictors integrated into this model can be obtained through periodic health checkups for employees in Japan. Therefore, this model can be easily implemented in the workplace to identify high-risk individuals and to introduce targeted preventive measures without additional efforts to gather other predictive information.
Our study demonstrated that the risk of CVD can be accurately predicted in the working population using data that are readily available from annual health checkups. The CVD risk prediction model showed good predictive ability, with an AUROC of approximately 0.80, which aligns with the reported ranges in previous studies among the general populations in Japan (0.78–0.81) and other countries (0.61–1.00)5, 8-11). Notably, the discrimination ability of the current model was found to be comparable to that of existing CVD risk models developed for working populations19, 20). For instance, a CVD risk model developed for a northern Italian working male population that incorporated lifestyle- and job-related conditions demonstrated an AUROC of 0.7519). In a large cohort of US male health professionals, the CVD risk model including lifestyle risk factors yielded an AUROC of 0.77 at 10 years20). In addition, both calibration indices (mean and weak calibration) suggested by the Strengthening Analytical Thinking for Observational Studies initiative and the calibration plot indicated that the model was well calibrated. Further studies are required to assess the performance of our CVD risk model in external populations.
Both national and international guidelines have documented that lifestyle modifications, such as smoking cessation, physical activity, a balanced diet, and adherence to medical treatments for conditions such as hypertension and dyslipidemia can reduce the risk of CVD21, 22). In the present study, we observed that more than one in three male workers was a current smoker, and chronic conditions such as hypertension (1 in 5 people) and dyslipidemia (1 in 2 people) were also common in the working population. However, less than half of those with these conditions are receiving medical treatment. This highlights the significant opportunity to reduce CVD risk in the working population through evidence-based interventions. The risk prediction model developed in this study can be used to identify high-risk individuals for targeted interventional studies aimed at reducing the risk of CVD.
Strengths and LimitationsThe strengths of this study include the contemporaneity of the cohort and the substantial baseline sample size of the workers. Furthermore, all predictors used in our model were routinely collected during annual health checkups. However, this study has several limitations. First, the CVD registry data were mainly based on data from medical certificates written by a physician and submitted to the company by a worker. This registry primarily covers relatively severe cases as a medical certificate is mandated for long-term sick leave (≥ 2 weeks). Meanwhile, individuals with milder forms of CVD who take sick leave for less than 2 weeks are not required to submit a medical certificate, which may have resulted in an underestimation of these events in our study. Second, in approximately half of the fatal CVD cases, the cause of death was confirmed by a death certificate, while the remainder relied on less reliable sources. Given the relatively small number of fatal cases, the impact of potential inaccuracies from non-certificate causes on the overall results is likely to be limited. Most nonfatal cases were identified by treating physicians’ certificates, which are generally considered reliable despite the lack of additional validation data. Third, the relatively small number of women in our cohort may have affected the performance of our prediction models for women, despite our multivariable-adjusted model not showing a significant association between sex and CVD risk. Fourth, we did not have information regarding socioeconomic status, family history of CVD, occupational risk factors, and additional lifestyle factors such as alcohol consumption, diet, and physical activity, which could enhance the performance of our model. Fifth, the open cohort design included participants with recent data for the model coefficient estimation. However, this may have led to imprecision in the 10-year survival estimates, as not all participants provided 10 years of follow-up data. To assess this, we restricted the analysis to a closed cohort of participants enrolled between fiscal years 2011 and 2012, and found minimal differences in the baseline survival function. Finally, as our study was based on a Japanese occupational cohort, caution should be exercised when applying this model to other populations.
This study presents a CVD risk model that was specifically developed for practical applications in assessing the risk of CVD in workplace settings. This model can be used to identify populations with an elevated risk of CVD and assist in the design of targeted workplace-based preventive primary care interventions. Further studies are required to assess the external validity and transferability of the proposed CVD risk model.
This work was supported by the Industrial Health Foundation, Industrial Disease Clinical Research Grants (140202-01, 150903-01, 170301-01), JSPS KAKENHI Grants (JP25293146, JP25702006, JP16H05251, JP20H03952), and NCGM Intramural Research Fund (28-Shi-1206, 30-Shi-2003, 19A1006, 21A1020, 22A1008).
The authors declare that they have no conflicts of interest to disclose.