2022 Volume 4 Issue 1 Article ID: 2021-0023-OA
Objectives: Predictive models for the onset of metabolic syndrome (MS) for people in their 30s are scarce. This study aimed to construct a highly accurate model to predict MS onset by 40 years of age and to identify important predictors of MS onset using health checkup data of Japanese employees aged between 30 and 35 years. Methods: The study included 6,048 Japanese employees aged 40 years who underwent periodic health examinations over 10 years. We developed predictive models for MS onset using machine learning methods, including random forest and logistic regression models. The variable importance of each explanatory variable was calculated to identify important predictors of MS onset for the random forest models. Results: Of 2,998 participants, 164 participants aged 30 and 180 of 4,045 participants aged 35 years developed MS by age 40 years. The random forest models have the highest predictive power (e.g., AU-ROC 0.867 for males aged 30) compared to the logistic regression models. In these models, diastolic blood pressure was the most important predictor of MS onset for males, while body mass index was the most important predictor for females. Conclusions: We created machine learning models to predict MS onset at the age of 40 years with high accuracy from health examination data obtained at the age of 30 or 35 years. Sex differences in important predictors of MS onset were shown by the variable importance indices of the random forest. Applying our model in routine healthcare management could provide early health interventions to prevent MS onset.
Metabolic syndrome (MS) is a combination of metabolic disorders, including obesity, hyperglycemia, hypertension, and lipid abnormalities, which predispose patients to diabetes and cardiovascular disease1). There are over 1 billion people with MS worldwide, indicating the need for effective measures to prevent a further increase in MS prevalence2). Therefore, preventing MS is an important issue from a public health perspective. In addition, MS prevention has an emergent influence on health economics3), health inequalities4), and occupational health problems, including overwork injury prevention and the promotion of older workers5).
In 1999, the World Health Organization (WHO) proposed criteria for the diagnosis of MS6). Since then, two approaches to the diagnostic criteria for MS have been advanced. The first is based on the WHO concept, which includes insulin resistance and visceral fat6,7). The second is based on the overlap of risk factors for cardiovascular disease, including obesity and hypertension8,9). The former concept is mainly used in Japan7), while the latter is commonly used in the United States and European countries.
In Japan, people aged 40 years or older have the option to receive health guidance as a preventive measure against MS in accordance with national law. However, there are no MS prevention measures for people younger than 40 years old. To the best of our knowledge, few studies have examined long-term (at least 10 years), large-scale health checkup data of employees in their 30s. However, in recent years, health checkups and health guidelines for young workers in their 30s have been promoted in Japan, and there is a need for improvement in the methodology and rationale for providing health interventions for workers in this age bracket.
Previous studies have reported that the basis for MS is established before the age of 4010) and that the lifestyle choices in one’s 30s are associated with the later development of MS11,12,13). In addition, a previous study on weight control suggests that health education by the age of 35 years leads to weight reduction at the age of 40 years14). Another study also reported that health guidance for people in their 30s is important because lifestyle habits in their 30s are reflected in future health checkup results15).
Therefore, identifying individuals at high risk of MS in their 30s and intervening to improve lifestyle habits could be useful for the prevention MS. However, to the best of our knowledge, there are no studies on the identification of high-risk groups for developing MS in their 30s. We hypothesized that the results of health checkups in one’s 30s reflect lifestyle habits in the same period and that the prediction of MS onset in their 40s is possible using health checkup data. The longitudinal collection of health checkup data in their 30s will enable us to verify this hypothesis.
From an analytical perspective, existing studies often use regression analysis for prediction10), and one limitation of regression analysis is collinearity of explanatory variables. Random forest (RF), a machine learning technique, is an analytical method that avoids this problem. A previous study reported that the prediction accuracy of a diabetes prediction model was higher with RF models than with logistic analysis models because of the differences in collinearity16).
In this study, we used longitudinal health examination data of males and females in their 30s from a single Japanese company. By using a highly interpretable machine learning method17) and comparing the detected variables to clinically well-known factors, we confirmed the validity of the models and identified important factors associated with the development of MS in males and females in their 30s.
The study included Japanese employees 30 years of age in 2008 or 2009 that underwent continuous periodic health examinations conducted between 2008 and 2019 by Health Insurance Association A. Health Insurance Association A oversees health management across 525 business sites throughout Japan. The business sites include various industries, including manufacturing, sales, engineering, and clerical positions. Of the 6,248 individuals who received physical examinations at age 30 years and of the 6,235 individuals who underwent physical examinations at age 35 years, the participants who did not develop MS at age 30 years but developed MS at age 40 years were included in this analysis (Figure 1).
Participant selection flow.
*Study participants who could not be assessed for MS because of missing values. MS, metabolic syndrome.
We excluded participants who could not be evaluated for MS due to missing values and those confirmed to have MS prior to the age of 40 years. If participants underwent two medical examinations in the same year, only the data point with fewer missing values or the first data set obtained when the number of missing values was equivalent in the two examinations were included in the analysis.
Finally, we prepared two datasets for analysis in this study. The first was created by combining data from the health examination aged 30 years with the MS evaluation data at the age of 40 years. The second dataset was created by combining data from health examinations at the age of 35 years with the MS evaluation data at the age of 40 years. Finally, these two datasets were used to construct predictive models for MS onset at the age of 40 years and to examine important predictors of MS onset.
Measures OutcomeThe primary outcome of this study was onset of MS at the age of 40 years. The diagnostic criteria for MS were based on the 2005 Japanese Journal of Internal Medicine criteria4), which is the most widely used criteria in Japan.
For the diagnosis of MS, a notably large waist (male ≥85 cm, female ≥90 cm) was set as a mandatory criteria, along with at least two of the following: blood pressure (systolic blood pressure ≥130 mmHg or diastolic blood pressure ≥85 mmHg), lipid levels (triglycerides ≥150 mg/dL or high-density lipoprotein cholesterol [HDL-C] ≤40 mg/dL), and blood glucose levels (fasting blood glucose ≥110 mg/dL) exceeding standard values. In addition, patients receiving medication for blood pressure, lipids, or blood glucose were considered to meet the criteria for each item, even if the standard values were not exceeded.
Predictive variablesIn constructing the predictive model, we used 16 examination items or 12 interview items from the health examination prescribed by Japanese law. The validity and reliability of these interview items were verified by the Standard Health Examination and Health Guidance Program18,19) from the Japanese Ministry of Health, Labour and Welfare. The examination items included (1) body mass index (BMI) (kg/m2); (2) waist circumference (cm); (3) systolic blood pressure (mmHg); (4) diastolic blood pressure (mmHg); (5) HDL-C (mg/dL); (6) low-density lipoprotein cholesterol (LDL-C) (mg/dL); (7) triglycerides (mg/dL); (8) alanine aminotransferase (ALT) (U/L); (9) aspartate aminotransferase (AST) (U/L); (10) γ-glutamyl transpeptidase (γ-GTP) (U/L); (11) blood glucose (mg/dL); (12) hematocrit (%); (13) hemoglobin (g/dL); (14) red blood cells (104/μL); (15) white blood cells (102/μL); and (16) uric acid (mg/dL). LDL-C was calculated using either the direct method or the Friedewald estimation formula. Blood glucose was measured in the fasting state.
The interview items included (1) daily alcohol consumption; (2) having breakfast; (3) paying attention to nutritional balance; (4) walking for more than 1 h per day; (5) walking speed; (6) intention to improve health; (7) eating before bed; (8) restful sleep; (9) eating too fast; (10) cigarette smoking; (11) weight gain >10 kg; and (12) exercising more than twice a week.
Physical measurements and blood tests were treated as continuous variables, and all questionnaire items were treated as binary variables in the prediction models for better adaptability. The cutoff points for converting questionnaires with multiple options to binary variables are based on previous studies15,20). A summary table on variable processing is provided in eTable 1.
Statistical analysisThe predictive models for MS onset were created using machine learning methods, including RF and logistic regression (LR), and were evaluated with the area under the receiver operating characteristics curve (AU-ROC) and the area under the precision-recall curve (AU-PRC). Precision-recall curve is a graph with precision values on the y-axis and recall values on the x-axis.
Training data were used to create the machine learning models, and test data were used to check model accuracy. The training and test datasets were randomly selected from the original dataset in a 4:1 ratio for males and a 1:1 ratio for females because of the small number of MS cases for females. In all models, the outcome was either the presence or absence of MS at the age of 40 years, and the 28 items, including all examination items and interview items, were used as explanatory variables. In these models, sex was treated as a stratified variable rather than an adjusted variable because outcome criteria are different for males and females.
In the construction of the predictive model, RF modelling was performed using the randomForest package of the statistical software R (R Foundation for Statistical Computing, Vienna, Austria). The number of the decision trees was set to 1,000, and the minimum size of the terminal nodes was set to 1. All other parameters, including the number of features used to create the decision trees were automatically set by the R caret package21). The Gini index was used as an impurity function14). The analysis was performed with 10-segment cross-validation.
In the LR model, all variables were used as explanatory variables with the forced entry method to compare the performance using the same number of explanatory variables as RF. To avoid the complete separation problems with a small number of the incident cases, Firth’s bias-reduced logistic regression was used to create the LR model for the 30-year-old females22). LR models were not used to evaluate the importance of the predictive variables because a small number of incident cases can affect the interpretation of those variables.
When we created the RF models, the variable importance of each explanatory variable was calculated to identify important predictors of MS. In the calculation of variable importance, we used RF with conditional inference trees (using the cForest package)23) because RF tends to underestimate categorical variables with fewer categories. In addition to creating the predictive models, we used multidimensional scaling (MDS) to evaluate the similarity between MS and non-MS patients. MDS visualizes the degree of similarity between individual participants in a dataset24). All analyses were performed using R version 3.6.1, with the significance level set as 0.05.
Ethical considerationsIn this study, we used health checkup data collected by Health Insurance Association A. All datasets used in this study were anonymized and statistically processed in systems that were not connected to any external networks or the internet. Personal information was strictly protected and managed in accordance with the ethical guidelines established by the government (Ethical Guidelines for Medical Research Involving Human Subjects)25).
This study was approved by the Ethics Committee of the University of Yamanashi (Ethics Committee receipt number 2201) and by the Ethics Committee of Health Insurance Association A (receipt number 2019-002). The study details were described on the websites of the company and the university where the study was conducted, and all study participants were offered the opportunity to review and opt out of this study.
Of the 6,248 participants who underwent health checkups aged 30 years and the 6,235 participants who underwent health checkups aged 35 years, 2,998 (2,342 males and 656 females) aged 30 years and 4,045 (3,098 males and 947 females) aged 35 years had MS assessment data available and had not developed MS by the age of 40 years. Of 2,998 participants, 164 aged 30 years and 180 of 4,045 participants aged 35 years had developed MS by age 40 years (Figure 1).
When blood test data taken at the age of 30 were compared between the MS-onset and non-MS-onset groups at the age of 40, all blood test data except HDL-C were significantly higher in the MS-onset group, and HDL-C was significantly lower in the MS-onset group (Table 1). Blood test data at the age of 35 years were also compared between the MS-onset and non-MS-onset groups aged 40 years, with the same results as at the age of 30 years.
(a) Examination data | |||||||||
---|---|---|---|---|---|---|---|---|---|
Examination data | 30-year-old | 35-year-old | 40-year-old | ||||||
40 years MS (-) | 40 years MS (+) | 40 years MS (-) | 40 years MS (+) | 40 years MS (-) | 40 years MS (+) | ||||
p-value | p-value | p-value | |||||||
male | n=2,186 | n=156 | n=2,927 | n=171 | n=3,857 | n=351 | |||
Body mass index, kg/m2* | 22.03 (2.80) | 25.34 (3.27) | <0.001 | 22.60 (3.02) | 26.18 (3.02) | <0.001 | 23.26 (3.22) | 28.83 (4.07) | <0.001 |
Waist circumference, cm* | 77.95 (7.57) | 86.73 (8.58) | <0.001 | 80.18 (8.20) | 89.64 (7.69) | <0.001 | 82.28 (8.66) | 96.50 (9.23) | <0.001 |
Systolic blood pressure, mmHg* | 115.08 (11.45) | 123.35 (12.35) | <0.001 | 116.42 (11.10) | 125.91 (12.06) | <0.001 | 118.20 (11.43) | 134.99 (12.89) | <0.001 |
Diastolic blood pressure, mmHg | 68.94 (8.43) | 75.97 (9.42) | <0.001 | 70.97 (8.60) | 78.95 (10.03) | <0.001 | 73.67 (9.28) | 88.01 (10.42) | <0.001 |
HDL-C, mg/dL* | 58.16 (12.49) | 52.16 (13.61) | <0.001 | 57.14 (12.92) | 49.76 (11.45) | <0.001 | 58.01 (13.65) | 46.93 (10.98) | <0.001 |
LDL-C, mg/dL* | 111.59 (28.83) | 127.62 (32.99) | <0.001 | 119.20 (30.78) | 132.49 (35.76) | <0.001 | 123.54 (30.82) | 135.12 (33.04) | <0.001 |
Blood glucose, mg/dL* | 89.15 (7.91) | 92.22 (12.80) | <0.001 | 90.30 (8.43) | 94.38 (9.32) | <0.001 | 90.89 (9.91) | 105.85 (31.75) | <0.001 |
Uric acid, mg/dL* | 5.97 (1.11) | 6.63 (1.19) | <0.001 | 6.05 (1.12) | 7.01 (1.18) | <0.001 | 6.29 (1.23) | 7.22 (1.38) | <0.001 |
Hemoglobin, g/dL* | 15.30 (0.87) | 15.62 (0.89) | <0.001 | 15.33 (0.89) | 15.79 (0.87) | <0.001 | 15.25 (0.93) | 15.97 (0.97) | <0.001 |
Hematocrit, %* | 46.49 (2.62) | 47.49 (2.85) | <0.001 | 46.21 (2.71) | 47.66 (2.76) | <0.001 | 46.11 (2.79) | 48.08 (2.88) | <0.001 |
Red blood cells, 104/μL* | 503.22 (31.52) | 515.49 (31.54) | <0.001 | 502.00 (31.87) | 517.09 (33.15) | <0.001 | 499.61 (33.44) | 522.59 (35.49) | <0.001 |
White blood cells, 102/μL* | 58.30 (14.05) | 65.33 (15.89) | <0.001 | 60.05 (17.59) | 66.68 (18.64) | <0.001 | 59.79 (15.70) | 70.87 (17.61) | <0.001 |
Alanine aminotransferase, U/L† | 26 (14-26) | 30 (20-48) | <0.001 | 20(15-30) | 35(23-57) | <0.001 | 22(17-33) | 46(31-69) | <0.001 |
Aspartate aminotransferase, U/L† | 23 (17-23) | 23 (19-30) | <0.001 | 20(17-24) | 25(20-31) | <0.001 | 21(18-26) | 29(23-40) | <0.001 |
γ-glutamyl transpeptidase, U/L† | 31 (18-31) | 35.5 (24-58) | <0.001 | 25(18-38) | 43(30-74) | <0.001 | 28(20-46) | 60(41-99) | <0.001 |
Triglycerides, mg/dL† | 105 (56-105) | 117 (87-170) | <0.001 | 83(60-120) | 136(106-193) | <0.001 | 89(63-126) | 198(157-263) | <0.001 |
female | n=648 | n=8 | n=938 | n=9 | n=1,762 | n=24 | |||
Body mass index, kg/m2* | 20.04 (2.59) | 30.98 (4.12) | <0.001 | 20.76 (3.12) | 34.68 (4.10) | <0.001 | 21.49 (3.50) | 34.10 (5.81) | <0.001 |
Waist circumference, cm* | 71.08 (7.22) | 95.89 (9.44) | <0.001 | 73.53 (8.29) | 105.69 (14.15) | <0.001 | 75.41 (9.01) | 104.88 (14.13) | <0.001 |
Systolic blood pressure, mmHg* | 105.51 (11.06) | 124.25 (12.67) | <0.001 | 107.59 (12.16) | 126.89 (12.44) | <0.001 | 110.24 (12.88) | 138.88 (14.95) | <0.001 |
Diastolic blood pressure, mmHg | 63.99 (8.58) | 75.88 (8.11) | <0.001 | 66.05 (9.39) | 78.44 (4.45) | <0.001 | 67.75 (10.19) | 86.04 (10.19) | <0.001 |
HDL-C, mg/dL* | 70.37 (13.71) | 55.00 (12.18) | .002 | 68.56 (14.17) | 56.00 (15.84) | .008 | 69.19 (14.49) | 48.92 (10.77) | <0.001 |
LDL-C, mg/dL* | 99.17 (23.17) | 127.75 (24.28) | .001 | 106.27 (26.73) | 133.89 (21.79) | .002 | 109.01 (28.16) | 140.54 (31.72) | <0.001 |
Blood glucose, mg/dL* | 84.74 (6.05) | 93.38 (6.50) | <0.001 | 86.52 (7.25) | 100.67 (9.45) | <0.001 | 87.46 (8.13) | 123.96 (52.13) | <0.001 |
Uric acid, mg/dL* | 4.19 (0.87) | 5.63 (1.26) | <0.001 | 4.23 (0.87) | 6.33 (1.28) | <0.001 | 4.35 (0.97) | 5.71 (1.14) | <0.001 |
Hemoglobin, g/dL* | 12.92 (1.09) | 14.03 (1.33) | .004 | 12.96 (1.10) | 13.98 (1.04) | .006 | 12.89 (1.26) | 14.27 (0.93) | <0.001 |
Hematocrit, %* | 40.25 (3.02) | 44.27 (3.37) | <0.001 | 40.01 (2.91) | 42.43 (2.62) | .013 | 40.01 (3.33) | 43.82 (1.91) | <0.001 |
Red blood cells, 104/μL* | 438.23 (29.83) | 488.75 (41.93) | <0.001 | 442.08 (30.99) | 481.00 (24.12) | <0.001 | 442.10 (31.76) | 489.08 (32.09) | <0.001 |
White blood cells, 102/μL | 57.68 (15.18) | 76.25 (16.88) | <0.001 | 58.30 (15.49) | 75.56 (11.66) | <0.001 | 58.82 (16.08) | 82.79 (14.79) | <0.001 |
Alanine aminotransferase, U/L† | 12 (10-15) | (14-30) | .008 | 12(10-15) | 24(17-89) | <0.001 | 16(12-21) | 39.5(25-48) | <0.001 |
Aspartate aminotransferase, U/L† | 17 (15-19) | (14-21) | .524 | 17(15-19) | 18(18-63) | .05 | 17(15-20) | 25.5(17-42) | <0.001 |
γ-glutamyl transpeptidase, U/L† | 15 (12-18) | (16-43) | .004 | 15(12-19) | 26(20-44) | <0.001 | 13(10-17) | 27(19-59) | <0.001 |
Triglycerides, mg/dL† | 52 (40-67) | (58-157) | .003 | 57(44-75) | 125(106-131) | <0.001 | 60(47-82) | 158(125-205) | <0.001 |
(b) Questionnaire | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Questionnaire | 30-year-old | 35-year-old | 40-year-old | |||||||
40 years MS (-) | 40 years MS (+) | 40 years MS (-) | 40 years MS (+) | 40 years MS (-) | 40 years MS (+) | |||||
p-value | p-value | p-value | ||||||||
male | n=2,186 | n=156 | n=2,927 | n=171 | n=3,857 | n=351 | ||||
Drinking alcohol every day‡ | Yes | 216 (9.9) | 21 (13.5) | .168 | 410 (14.1) | 32 (18.8) | 0.091 | 633 (16.4) | 55 (15.7) | 0.763 |
No | 1,969 (90.1) | 135 (86.5) | 2,503 (85.9) | 138 (81.2) | 3,222 (83.6) | 296 (84.3) | ||||
Having breakfast‡ | Yes | 1,409 (64.5) | 90 (57.7) | .101 | 2,012 (69) | 100 (58.8) | 0.006 | 2,769 (71.8) | 229 (65.4) | 0.013 |
No | 777 (35.5) | 66 (42.3) | 902 (31) | 70 (41.2) | 1,085 (28.2) | 121 (34.6) | ||||
Paying attention to nutritional balance‡ | Yes | 1,509 (76.5) | 96 (73.8) | .522 | 2,345 (80.5) | 141 (82.9) | 0.485 | 1,958 (50.8) | 155 (44.3) | 0.022 |
No | 463 (23.5) | 34 (26.2) | 568 (19.5) | 29 (17.1) | 1,894 (49.2) | 195 (55.7) | ||||
Walking more than 1 hour per day‡ | Yes | 800 (36.6) | 48 (30.8) | .144 | 1,087 (37.3) | 61 (35.9) | 0.744 | 2,455 (63.7) | 181 (51.6) | <0.001 |
No | 1,383 (63.4) | 108 (69.2) | 1,824 (62.7) | 109 (64.1) | 1,400 (36.3) | 170 (48.4) | ||||
Walking speed‡ | Yes | 1,140 (52.2) | 70 (44.9) | .082 | 1,459 (50.1) | 76 (44.7) | 0.18 | 2,004 (52.0) | 134 (38.2) | <0.001 |
No | 1,045 (47.8) | 86 (55.1) | 1,454 (49.9) | 94 (55.3) | 1,850 (48.0) | 217 (61.8) | ||||
Intention to improve health‡ | Yes | 1,592 (72.9) | 131 (84) | .002 | 1,989 (68.3) | 140 (82.4) | <0.001 | 2,725 (70.7) | 300 (85.5) | <0.001 |
No | 591 (27.1) | 25 (16) | 924 (31.7) | 30 (17.6) | 1,129 (29.3) | 51 (14.5) | ||||
Eating before bed‡ | Yes | 1,223 (55.9) | 90 (58.1) | .617 | 1,677 (57.6) | 105 (62.1) | 0.263 | 754 (19.6) | 71 (20.3) | 0.726 |
No | 963 (44.1) | 65 (41.9) | 1,232 (42.4) | 64 (37.9) | 3,099 (80.4) | 279 (79.7) | ||||
Getting rest from sleep‡ | Yes | 1,223 (56.2) | 81 (52.3) | .358 | 1,710 (58.7) | 102 (60) | 0.81 | 2,688 (69.7) | 219 (62.4) | 0.005 |
No | 954 (43.8) | 74 (47.7) | 1,202 (41.3) | 68 (40) | 1,166 (30.3) | 132 (37.6) | ||||
Eating too fast‡ | Yes | 776 (35.5) | 70 (45.2) | .019 | 979 (33.6) | 84 (49.4) | <0.001 | 1,502 (39.0) | 142 (40.5) | 0.607 |
No | 1,408 (64.5) | 85 (54.8) | 1,935 (66.4) | 86 (50.6) | 2,350 (61.0) | 209 (59.5) | ||||
Smoke cigarettes‡ | Yes | 730 (33.4) | 65 (41.7) | .044 | 916 (31.4) | 62 (36.5) | 0.176 | 2,305 (59.8) | 225 (64.1) | 0.124 |
No | 1,456 (66.6) | 91 (58.3) | 1,997 (68.6) | 108 (63.5) | 1,549 (40.2) | 126 (35.9) | ||||
Gained more than 10 kg in weight‡ | Yes | 393 (18) | 75 (48.1) | <0.001 | 805 (29.1) | 109 (65.7) | <0.001 | 1,576 (41.6) | 285 (82.1) | <0.001 |
No | 1,793 (82) | 81 (51.9) | 1,957 (70.9) | 57 (34.3) | 2,213 (58.4) | 62 (17.9) | ||||
Exercise more than twice a week‡ | Yes | 369 (16.9) | 39 (25) | .016 | 515 (17.7) | 29 (17.2) | 0.918 | 766 (19.9) | 60 (17.1) | 0.233 |
No | 1,816 (83.1) | 117 (75) | 2,399 (82.3) | 140 (82.8) | 3,089 (80.1) | 291 (82.9) | ||||
female | n=648 | n=8 | n=938 | n=9 | n=1,762 | n=24 | ||||
Drinking alcohol every day‡ | Yes | 23 (3.5) | 1 (12.5) | .259 | 56 (6.00) | 1 (11.1) | .430 | 138 (7.8) | 2 (8.30) | .712 |
No | 625 (96.5) | 7 (87.5) | 880 (94.0) | 8 (88.9) | 1,622 (92.2) | 22 (91.7) | ||||
Having breakfast‡ | Yes | 142 (21.9) | 1 (12.5) | 1 | 759 (81.0) | 8 (88.9) | 1 | 1,431 (81.3) | 18 (75.0) | .430 |
No | 506 (78.1) | 7 (87.5) | 178 (19.0) | 1 (11.1) | 329 (18.7) | 6 (25.0) | ||||
Paying attention to nutritional balance‡ | Yes | 521 (85) | 6 (85.7) | 1 | 817 (87.3) | 8 (88.9) | 1 | 1,562 (88.8) | 21 (87.5) | .746 |
No | 92 (15) | 1 (14.3) | 119 (12.7) | 1 (11.1) | 198 (11.2) | 3 (12.5) | ||||
Walking more than 1 hour per day‡ | Yes | 176 (27.2) | 1 (12.5) | .689 | 237 (25.3) | 4 (44.4) | .244 | 526 (29.9) | 8 (33.3) | .823 |
No | 472 (72.8) | 7 (87.5) | 700 (74.7) | 5 (55.6) | 1,233 (70.1) | 16 (66.7) | ||||
Walking speed‡ | Yes | 245 (37.8) | 2 (25) | .717 | 380 (40.6) | 3 (33.3) | .746 | 758 (43.1) | 5 (20.8) | .036 |
No | 403 (62.2) | 6 (75) | 557 (59.4) | 6 (66.7) | 1,001 (56.9) | 19 (79.2) | ||||
Intention to improve health‡ | Yes | 504 (78) | 6 (75) | .691 | 698 (74.7) | 8 (88.9) | .463 | 1,361 (77.3) | 22 (91.7) | .136 |
No | 142 (22) | 2 (25) | 237 (25.3) | 1 (11.1) | 399 (22.7) | 2 (8.30) | ||||
Eating before bed‡ | Yes | 267 (41.3) | 1 (12.5) | .150 | 294 (31.4) | 3 (33.3) | 1 | 461 (26.2) | 6 (25.0) | 1 |
No | 380 (58.7) | 7 (87.5) | 643 (68.6) | 6 (66.7) | 1,298 (73.8) | 18 (75.0) | ||||
Getting rest from sleep‡ | Yes | 347 (53.6) | 6 (75) | .298 | 496 (53.1) | 3 (33.3) | .320 | 912 (51.8) | 12 (50.0) | 1 |
No | 300 (46.4) | 2 (25) | 438 (46.9) | 6 (66.7) | 848 (48.2) | 12 (50.0) | ||||
Eating too fast‡ | Yes | 139 (21.5) | 2 (25) | .684 | 201 (21.5) | 3 (33.3) | .415 | 436 (24.8) | 7 (29.2) | .636 |
No | 508 (78.5) | 6 (75) | 735 (78.5) | 6 (66.7) | 1,324 (75.2) | 17 (70.8) | ||||
Smoke cigarettes‡ | Yes | 81 (12.5) | 2 (25) | .268 | 92 (9.80) | 1 (11.1) | .608 | 183 (10.4) | 4 (16.7) | .308 |
No | 567 (87.5) | 6 (75) | 845 (90.2) | 8 (88.9) | 1,577 (89.6) | 20 (83.3) | ||||
Gained more than 10 kg in weight‡ | Yes | 49 (7.6) | 5 (62.5) | <0.001 | 110 (12.0) | 8 (88.9) | <0.001 | 360 (22.2) | 22 (100.0) | <0.001 |
No | 597 (92.4) | 3 (37.5) | 803 (88.0) | 1 (11.1) | 1,260 (77.8) | 0 (00.0) | ||||
Exercise more than twice a week‡ | Yes | 66 (10.2) | 1 (12.5) | .580 | 79 (8.4) | 2 (22.2) | .176 | 199 (11.3) | 3 (12.5) | .747 |
No | 582 (89.8) | 7 (87.5) | 858 (91.6) | 7 (77.8) | 1,561 (88.7) | 21 (87.5) |
Values are presented as mean ± SD (standard deviation) or median (IQR) or as n(%) *t-test †Mann–Whitney U test ‡Chi-square test
HDL-C, High-density lipoprotein cholesterol; IQR, interquartile range; LDL-C, low-density lipoprotein cholesterol; MS, metabolic syndrome; SD, standard deviation.
In terms of the health examination questions at the ages of 30 and 35 years, we had significantly higher prevalence of “yes” responses among six questions (“Drinking alcohol every day,” “Not having breakfast,” “Intention to improve,” “Gained more than 10 kg in weight,” “Eating too fast,” and “Smoke cigarettes”) (Table 1).
We compared the predictive accuracy of the models using AU-ROC and AU-PRC and found that RF had a higher AU-ROC and AU-PRC than LR in all the models, although the differences in accuracy were not significant (Table 2 and Figure 2).
Random forest | Logistic regression | p-value* | ||
---|---|---|---|---|
Prediction for 30-year-old males | AU-ROC | 0.867 | 0.852 | .64 |
Sensitivity† | 0.882 | 0.765 | ||
Specificity† | 0.737 | 0.863 | ||
AU-PRC | 0.392 | 0.356 | ||
Prediction for 35-year-old males | AU-ROC | 0.876 | 0.852 | .15 |
Sensitivity† | 0.818 | 0.879 | ||
Specificity† | 0.820 | 0.767 | ||
AU-PRC | 0.444 | 0.427 | ||
Prediction for 30-year-old females | AU-ROC | 0.985 | 0.748 | .31 |
Sensitivity† | 1.000 | 1.000 | ||
Specificity† | 0.974 | 0.510 | ||
AU-PRC | 0.550 | 0.377 | ||
Prediction for 35-year-old females | AU-ROC | 0.981 | 0.940 | .16 |
Sensitivity† | 1.000 | 1.000 | ||
Specificity† | 0.927 | 0.882 | ||
AU-PRC | 0.613 | 0.271 |
*DeLong’s test for two correlated ROC curves †At the optimal point on ROC curve
AU-PRC, area under the precision-recall curve; AU-ROC, area under the receiver operating characteristic.
Receiver operating characteristic (ROC) curve and precision-recall (PR) curve for predicting the onset of metabolic syndrome via the random forest and logistic regression models. A PR curve is a graph with precision values (i.e., positive predictive value) on the y-axis and recall values (i.e., sensitivity) on the x-axis.
We assessed the importance of the predictors by calculating the variable importance of the explanatory variables in the RF models, and diastolic blood pressure was shown to be the most important predictor of MS onset in males aged 30 and 35 years. LDL-C, BMI, HDL-C, waist circumference, and walking time were also identified as important factors for predicting MS onset among male participants aged 30 years. HDL-C and skipping breakfast were also revealed as important factors among male participants aged 35 years. For female participants aged 30 and 35 years, BMI, waist circumference, uric acid levels, and triglyceride levels were the most important predictors of MS onset (Figure 3).
Important predictors of metabolic syndrome onset with values of important variables calculated using the random forest method.
*Variable importance is expressed as percentages. X-axis indicates the variable importance in the random forest model when we set the degree of most influential variable in each model as 100%.
The MDS plots of the male participants at the ages of 30 and 35 years were divided into two major clusters, and most of the MS-onset group was included in the right cluster of the plot. Conversely, in the MDS plots of the female participants at the age of 30 and 35 years, non-MS cases formed a single cluster and MS-onset cases were scattered away from the non-MS-onset cluster (Figure 4).
Clustering of young workers with the multidimensional scaling (MDS) plot using the random forest method. We can evaluate the similarity between each of samples with the distance of each dot on the MDS plot. Dim1 and Dim2 represent the eigenvectors of the proximity matrix from random forest model.
In this study, we applied the RF and LR machine learning methods to 10-year longitudinal health checkup data from individuals aged 30 and 35 years employed at a Japanese company and developed models to predict MS onset among these individuals. RF models were found to have a higher predictive power than logistic regression models. We also determined important predictors by comparing the variable importance in RF models. The MDS plots using the RF method showed different characteristics between the participants with MS and those without MS onset.
In this study, the accuracy of the RF predictive model was higher than that of the LR model. In LR models, we assume that samples can be linearly stratified for each outcome; whereas the calculation methods used in RF models assure stratification26). Furthermore, RF has been reported to have better performance than other methods when there are many explanatory variables and interactions between variables27). In the present study, 28 explanatory variables were used to create the model, and a group of variables with interactions (e.g., BMI and waist circumference) were included. This may be why the accuracy of predicting the onset of MS was higher in RF models than in LR models.
In a previous study, the predictors of MS onset were investigated using regression analysis in Japanese males aged 30, 35, and 40 years, and an increase in BMI was reported as the most important predictor13). However, in the present study, diastolic blood pressure was found to be the most important predictor. This difference may be due to multicollinearity that occurred differently in the RF and LR models. A previous study showed that RF has better performance for nonlinear relationships than regression analysis (Cox proportional hazards regression)16). This characteristic of RF may lead to the presence of different factors for MS prediction. Another previous study using the neural network method also showed diastolic blood pressure as an important predictor28), and another study also reported that diastolic blood pressure is associated with the development of MS at a young age29). These studies support the results of the present study. The value of diastolic blood pressure may be more useful for judging the future onset of MS than the increase in BMI because blood pressure can be ascertained at one point, while the increase in BMI cannot be measured at one time.
In the analysis of female participants, the most important predictor of MS was BMI, followed by waist circumference, as noted in a previous study13). Waist circumference is the criterion for determining the onset of MS, and the criteria for waist circumference are stricter in females than in males. Therefore, the importance of waist circumference and BMI might be higher in female participants than in male participants.
Among the questionnaire items, walking time in males at the age of 30 years, skipping breakfast in males at the age of 35 years, walking speed in females at the age of 30 years, and not feeling rested from sleep in females at the age of 35 years were identified as important predictors. A previous study reported that daily exercise habits, regular diet, and restful sleep were associated with the development of MS in both males and females20). This finding is consistent with the results of the current study.
In the MDS plot of female participants, MS cases were sporadically located away from the cluster. This sporadic population may represent a future unhealthy population. However, in the plot of male participants, MS cases were concentrated in one of two separate clusters; thus, clusters with a high concentration of MS cases may represent future unhealthy populations. The differences between MS and non-MS participants in the MDS plot were more conspicuous among female participants than among male participants. This may have been because the criteria for determining MS was more strict in males than in females, and females who experience MS onset at the age of 40 had more distinctive characteristics than males in the same situation.
The strength of our study is our construction of models with machine learning methods to predict the onset of MS in males and in females using large longitudinal health examination data of young people collected over a 10-year period at a Japanese company.
In addition, using a highly interpretable machine learning method, we were able to identify important predictors from many health checkups items. To the best of our knowledge, this is the first study to develop predictive models with machine learning methods to predict the onset of MS using longitudinal data of males and female in their 30s.
LimitationsThis study has several limitations. First, there is a possibility of selection bias because we developed and evaluated the predictive model using data of healthy employees who spontaneously underwent health checkups in various business sites of a large company only. We also included multiple industry types; thus, the study population may not represent average young workers in Japan.
Second, it is difficult to apply the same result to populations in other countries because this study only used Japanese health examination data and MS criteria specific to Japan. However, since East Asians share similar characteristics, the present results may be applicable to East Asia, including China, Korea, and Taiwan. Furthermore, when the MS criteria of the International Diabetes Federation were applied to this study, we obtained similar results with the main analyses (eFigure 1. and eFigure 2.).
Third, a small sample size and low incidence of MS in female participants might lower the predictive ability of the models for the female population. Hence, we used Firth’s bias reduction method to reduce the problem of complete separation caused by the small number of cases.
Fourth, we could not prepare an external data set for this study. However, there are very few companies that have collected health checkup data for 10 consecutive years starting in the employees’ 30s; therefore, it is impractical to prepare an external dataset. To respond to this problem, we adjusted the internal data by using 10-folds cross-validation.
Fifth, we had a moderate number of study exclusions. Most exclusions were due to missing health examination data at the age of 40 years. Comparison of baseline characteristics between the study subjects and those that were excluded showed that study subjects were slightly worse in BMI, HDL-C, “bedtime eating,” and “willingness to improve” than the study exclusions (eTable 2). The performance of the developed model may be slightly reduced when the model is applied to those with characteristics similar to the study exclusions.
Sixth, the current study does not include socioeconomic factors (e.g., education and household income) or occupational factors (e.g., long working hours and shift work), which are important factors in predicting the onset of MS. Therefore, the validity of this study needs further investigation.
Seventh, limiting the outcome of this study to only 40 years of age may lead to an underestimation of MS incidence. However, since the health checkup items related to MS were introduced in 2008, it is difficult to secure sufficient data on MS onset for workers who were 30 years old at that time. Although we have data up to the age of 41 years, evaluating only those with data at age 41 years for 2 years and those with data at age 40 years for a single year may lead to bias in the prediction models. Therefore, we focused our analysis at 40 years of age, herein.
Finally, the method of measuring LDL-C in this study is not standardized, although the Friedewald estimating equation is the preferred method of measuring LDL cholesterol to diagnose MS. Furthermore, reagents for blood testing and physical measurement protocols (e.g., weight and blood pressure) at each health checkup site are not standardized. This might have affected the prediction model and interpretation of variables of importance in this study.
Clinical indicationBy applying the predictive model developed in this study to the health checkup data of males and females in their 30s, it may be possible to prevent MS onset in an effective way. For instance, companies and municipalities with limited medical sources can identify high-risk groups for MS by applying our models to their data.
In this study, in addition to BMI and waist circumference, diastolic blood pressure, LDL-C, and HDL-C in males and uric acid and triglyceride in females were noted as important predictors of MS. Additionally, walking habits in both males and females, skipping breakfast in males, and restful sleep in females were also presented as important factors. Based on these results, we can prevent MS onset efficiently by focusing on items by sex when conducting health guidance for people in their 30s.
Until recently, legal medical checkups in Japan were not required to include items related to MS onset (e.g., blood glucose and lipids) in one’s 30s. However, since April 2018, revisions to the law have made many companies include these items for people under the age of 40. Therefore, we believe that the prediction model developed in this study will be useful for people in their 30s in the future.
A previous study showed that health guidance for people under 40 years of age is effective for the prevention of MS onset30). Therefore, using this model to identify high-risk subjects in their 30s, we may efficiently prevent the onset of MS. Furthermore, providing health guidance to young people at high risk and focusing on the predictors identified in this study might lead to the effective prevention of MS onset, resulting in a reduction in the nation’s healthcare costs.
We developed a high-accuracy predictive model with a machine learning method that predicts MS onset at the age of 40 years based on health examination data obtained at the ages of 30 and 35 years. Some important sex-specific predictors were identified using this highly interpretable machine learning method. Applying our models to routine healthcare management should provide early and appropriate health interventions to young people for preventing the onset of MS in this population.
We are grateful to all the staff of Health Insurance Association A for preparing the dataset for the current study. The analysis code of this study can be found on the following website: (https://github.com/mysuda/tokuho_pred).
There are no conflicts of interest to declare.
The study protocol was examined and approved by the Ethics Committee of the University of Yamanashi (Ethics Committee receipt number R01688). The study was also approved by the Ethics Committee of Health Insurance Association A (receipt number 2019-002). All participants had the opportunity to opt out.
The authors received no financial support for the research, authorship, and publication of this article.
M.S. and T.O. participated in the design and conception of the study and its coordination, data acquisition, statistical analysis, and manuscript drafting. M.S and T.O contributed equally to this manuscript. Z.Y. reviewed the analysis and manuscript.
Drinking alcohol every day | Drink daily | yes | 1 |
Drink 4–6 days a week | no | 0 | |
Drink 1–3 days a week | |||
I don’t drink. | |||
Having breakfast | yes | 1 | |
no | 0 | ||
Paying attention to nutritional balance | Not paying much attention | no | 1 |
Paying attention from time to time | yes | 0 | |
Always attentive | |||
Walking more than 1 hour per day | yes | 1 | |
no | 0 | ||
Walking speed | slow | no | 1 |
fast | yes | 0 | |
Intention to improve health | I don’t intend to improve | No | 1 |
I will improve it (within 6 months) | yes | 0 | |
I plan to improve it in the near future. I’m starting little by little (within 1 month). | |||
Already working on it (under 6 months). | |||
Already working on it (over 6 months). | |||
Eating before bed | yes | 1 | |
no | 0 | ||
Getting rest from sleep | no | 1 | |
yes | 0 | ||
Eating too fast | fast | yes | 1 |
average | no | 0 | |
slow | |||
Smoke cigarettes | current smoker | yes | 1 |
former smoker | No | 0 | |
never smoker | |||
Gained more than 10 kg in weight | yes | 1 | |
no | 0 | ||
Exercises more than twice a week | no | 1 | |
yes | 0 |
Examination data | 30 years old | 35 years old | |||||
---|---|---|---|---|---|---|---|
Excluded | Analysis | Excluded | Analysis | ||||
p-value | p-value | ||||||
Male | n=1,817 | n=2,342 | n=1,283 | n=3,098 | |||
Body mass index, kg/m2 * | 22.79±3.88 | 22.25±2.95 | <0.001 | 23.77±4.27 | 22.80±3.13 | <0.001 | |
Waist circumference, cm* | 80.10±10.25 | 78.53±7.94 | <0.001 | 83.58±11.20 | 80.70±8.46 | <0.001 | |
Systolic blood pressure, mmHg * | 117.52±12.37 | 115.63±11.6 | <0.001 | 119.86±13.42 | 116.95±11.36 | <0.001 | |
Diastolic blood pressure, mmHg* | 70.14±9.34 | 69.41±8.67 | .010 | 73.68±10.57 | 71.41±8.87 | <0.001 | |
HDL-C, mg/dL* | 56.30±13.12 | 57.76±12.65 | <0.001 | 54.90±13.89 | 56.73±12.95 | <0.001 | |
LDL-C, mg/dL* | 112.56±28.72 | 112.67±29.39 | .914 | 121.41±31.04 | 119.94±31.21 | .167 | |
Blood glucose, mg/dL* | 90.16±11.75 | 89.35±8.36 | .012 | 92.88±18.02 | 90.53±8.53 | <0.001 | |
Uric acid, mg/dL* | 6.05±1.22 | 6.01±1.12 | .288 | 6.19±1.29 | 6.11±1.14 | .031 | |
Hemoglobin, g/dL* | 15.39±0.92 | 15.32±0.88 | .018 | 15.48±0.94 | 15.36±0.90 | <0.001 | |
Hematocrit, %* | 46.66±2.74 | 46.56±2.64 | .251 | 46.45±2.78 | 46.29±2.73 | .075 | |
Red blood cells, 104/μL* | 505.58±32.90 | 504.04±31.66 | .143 | 506.96±34.17 | 502.83±32.12 | <0.001 | |
White blood cells, 102/μL* | 61.71±16.80 | 58.77±14.28 | <0.001 | 63.67±18.20 | 60.42±17.71 | <0.001 | |
Alanine aminotransferase, U/L † | 19 (14–31) | 19 (14–27) | .048 | 23 (16–37) | 21 (15–31) | <0.001 | |
Aspartate aminotransferase, U/L† | 20 (16–24) | 19 (17–23) | .124 | 21 (18–27) | 20 (17–25) | <0.001 | |
γ-glutamyl transpeptidase, U/L† | 24 (18–37) | 23 (18–32) | .004 | 28 (20–50) | 25 (19–40) | <0.001 | |
Triglycerides, mg/dL† | 83 (57–123) | 78 (57–108) | <0.001 | 98 (65–160) | 86 (62–124) | <0.001 | |
female | n=1,433 | n=656 | n=907 | n=947 | |||
Body mass index, kg/m2 * | 20.12±3.06 | 20.18±2.88 | .069 | 20.99±3.67 | 20.89±3.40 | .552 | |
Waist circumference, cm* | 71.52±7.74 | 71.38±7.74 | .714 | 73.98±9.34 | 73.84±8.92 | .753 | |
Systolic blood pressure, mmHg * | 106.32±11.58 | 105.74±11.26 | .283 | 107.53±13.04 | 107.77±12.30 | .686 | |
Diastolic blood pressure, mmHg* | 64.32±8.70 | 64.14±8.67 | .664 | 65.40±9.50 | 66.17±9.43 | .081 | |
HDL-C, mg/dL* | 70.44±13.28 | 70.18±13.79 | .684 | 68.77±14.38 | 68.44±14.23 | .628 | |
LDL-C, mg/dL* | 99.67±25.47 | 99.52±23.38 | .896 | 106.49±27.49 | 106.54±26.81 | .971 | |
Blood glucose, mg/dL* | 84.55±8.07 | 84.85±6.13 | .408 | 86.09±12.47 | 86.65±7.40 | .238 | |
Uric acid, mg/dL* | 4.13±0.86 | 4.20±0.88 | .076 | 4.19±0.97 | 4.25±0.89 | .235 | |
Hemoglobin, g/dL* | 12.93±1.12 | 12.93±1.09 | .945 | 12.83±1.24 | 12.97±1.11 | .011 | |
Hematocrit, %* | 39.86±3.15 | 40.30±3.06 | .024 | 39.58±3.27 | 40.03±2.92 | .002 | |
Red blood cells, 104/μL* | 436.82±31.09 | 438.85±30.48 | .169 | 436.27±34.96 | 442.45±31.15 | <0.001 | |
White blood cells, 102/μL* | 58.78±16.50 | 57.90±15.33 | .258 | 59.00±16.02 | 58.46±15.54 | .470 | |
Alanine aminotransferase, U/L † | 12 (9–15) | 12 (10–15) | .671 | 12 (10–16) | 12 (10–15) | .106 | |
Aspartate aminotransferase, U/L† | 17 (15–19) | 17 (15–19) | .967 | 17 (15–19) | 17 (15–19) | .696 | |
γ-glutamyl transpeptidase, U/L† | 15 (12–18) | 15 (12–18) | .847 | 15 (12–19) | 15 (12–19) | .767 | |
Triglycerides, mg/dL† | 52 (41–69) | 52 (40–67) | .426 | 56 (45–78) | 57 (44–76) | .321 |
Questionnaire | 30 years old | 35 years old | |||||
---|---|---|---|---|---|---|---|
Excluded | Analysis | Excluded | Analysis | ||||
p-value | p-value | ||||||
male | n=1,817 | n=2,342 | n=1,283 | n=3,098 | |||
Drinking alcohol every day‡ | Yes | 194 (11.0) | 237 (10.1) | .355 | 178 (14.0) | 442 (14.3) | .812 |
No | 1,567 (89.0) | 2,104 (89.9) | 1,093 (86.0) | 2,641 (85.7) | |||
Having breakfast‡ | Yes | 1,083 (61.9) | 1,499 (64.0) | .159 | 853 (67.0) | 2,112 (68.5) | .353 |
No | 668 (38.1) | 843 (36.0) | 420 (33.0) | 972 (31.5) | |||
Paying attention to nutritional balance‡ | Yes | 1,113 (73.7) | 1,605 (76.4) | .066 | 957 (75.7) | 2,486 (80.6) | <0.001 |
No | 398 (26.3) | 497 (23.6) | 308 (24.3) | 597 (19.4) | |||
Walking more than 1 hour per day‡ | Yes | 609 (34.7) | 848 (36.3) | .322 | 489 (38.5) | 1,148 (37.3) | .470 |
No | 1,144 (65.3) | 1,491 (63.7) | 782 (61.5) | 1,933 (62.7) | |||
Walking speed‡ | Yes | 855 (48.9) | 1,210 (51.7) | .077 | 614 (48.2) | 1,535 (49.8) | .351 |
No | 894 (51.1) | 1,131 (48.3) | 659 (51.8) | 1,548 (50.2) | |||
Intention to improve health‡ | Yes | 1,276 (72.8) | 1,723 (73.7) | .544 | 874 (68.7) | 2,129 (69.1) | .801 |
No | 477 (27.2) | 616 (26.3) | 399 (31.3) | 954 (30.9) | |||
Eating before bed‡ | Yes | 949 (54.2) | 1,313 (56.1) | .240 | 694 (54.5) | 1,782 (57.9) | .043 |
No | 802 (45.8) | 1,028 (43.9) | 579 (45.5) | 1,296 (41.2) | |||
Getting rest from sleep‡ | Yes | 936 (53.7) | 1,304 (55.9) | .171 | 699 (55.1) | 1,812 (58.5) | .028 |
No | 806 (46.3) | 1,028 (44.1) | 569 (44.9) | 1,270 (41.2) | |||
Eating too fast‡ | Yes | 679 (38.8) | 846 (36.2) | .089 | 481 (37.8) | 1,063 (34.5) | .040 |
No | 1,072 (61.2) | 1,493 (63.8) | 792 (62.6) | 2,021 (65.5) | |||
Smoke cigarettes‡ | Yes | 707 (40.1) | 795 (33.9) | <0.001 | 467 (36.7) | 978 (31.7) | .002 |
No | 1,055 (59.9) | 1,547 (66.1) | 806 (63.3) | 2,105 (68.3) | |||
Gained more than 10 kg in weight‡ | Yes | 448 (25.6) | 468 (20.0) | <0.001 | 493 (40.0) | 914 (31.2) | <0.001 |
No | 1,305 (74.4) | 1,874 (80.8) | 740 (60.0) | 2,014 (68.8) | |||
Exercise more than twice a week‡ | Yes | 301 (17.2) | 408 (17.4) | .835 | 205 (16.1) | 544 (17.6) | .233 |
No | 1,452 (82.8) | 1,933 (82.6) | 1,068 (83.9) | 2,539 (82.4) | |||
female | n=1,433 | n=656 | n=907 | n=947 | |||
Drinking alcohol every day‡ | Yes | 54 (3.9) | 24 (3.70) | .902 | 48 (5.3) | 57 (6.0) | .548 |
No | 1,338 (96.1) | 632 (96.3) | 850 (94.7) | 888 (94.0) | |||
Having breakfast‡ | Yes | 1,055 (76.6) | 513 (78.2) | .463 | 706 (78.6) | 767 (81.1) | .201 |
No | 322 (23.4) | 143 (21.8) | 192 (21.4) | 179 (18.9) | |||
Paying attention to nutritional balance‡ | Yes | 1,048 (85.5) | 527 (85.0) | .781 | 769 (86.7) | 825 (87.3) | .728 |
No | 178 (14.5) | 93 (15.0) | 118 (13.3) | 120 (12.7) | |||
Walking more than 1 hour per day‡ | Yes | 334 (24.3) | 177 (27.0) | .190 | 221 (24.7) | 241 (25.5) | .707 |
No | 1,042 (75.7) | 479 (73.0) | 675 (75.3) | 705 (74.5) | |||
Walking speed‡ | Yes | 513 (37.4) | 247 (37.7) | <.922 | 365 (40.7) | 383 (40.5) | .962 |
No | 858 (62.6) | 409 (62.3) | 532 (59.3) | 563 (59.5) | |||
Intention to improve health‡ | Yes | 1,156 (84.1) | 510 (78.0) | .001 | 713 (79.6) | 706 (74.8) | .015 |
No | 218 (15.9) | 144 (22.0) | 183 (20.4) | 238 (25.2) | |||
Eating before bed‡ | Yes | 422 (30.8) | 268 (40.9) | <0.001 | 271 (30.2) | 297 (31.4) | .614 |
No | 949 (69.2) | 387 (59.1) | 626 (69.8) | 649 (68.6) | |||
Getting rest from sleep‡ | Yes | 725 (53.0) | 353 (53.9) | .739 | 439 (48.9) | 499 (52.9) | .085 |
No | 643 (47.0) | 302 (46.1) | 459 (51.1) | 444 (72.1) | |||
Eating too fast‡ | Yes | 300 (21.9) | 141 (21.5) | .908 | 236 (26.3) | 204 (21.6) | .019 |
No | 1,072 (78.1) | 514 (78.5) | 661 (73.7) | 741 (78.4) | |||
Smoke cigarettes‡ | Yes | 216 (15.5) | 83 (12.7) | .094 | 116 (12.9) | 93 (9.8) | .040 |
No | 1,177 (84.5) | 573 (87.3) | 782 (87.1) | 853 (90.2) | |||
Gained more than 10 kg in weight‡ | Yes | 105 (7.6) | 54 (8.3) | .659 | 131 (14.8) | 118 (12.8) | .246 |
No | 1,269 (92.4) | 600 (91.7) | 756 (85.2) | 804 (87.2) | |||
Exercise more than twice a week‡ | Yes | 135 (9.8) | 67 (10.2) | .812 | 104 (11.6) | 81 (8.6) | .036 |
No | 1,241 (90.2) | 589 (89.8) | 793 (88.4) | 865 (91.4) |
Values are presented as mean ± SD or median (IQR) *t-test †Mann–Whitney U test ‡Chi-square test
HDL-C, High-density lipoprotein cholesterol; IQR, interquartile range; LDL-C, low-density lipoprotein cholesterol; SD, standard deviation.
Values are presented as n (%)
Receiveroperating characteristic(ROC) curves and precision-recall (PR) curves using overseas criteria for determining Metabolic Syndrome (International Diabetes Federation: IDF)
The important predictors for metabolic S)rndrome onset using overseas criteria for determining Metabolic Syndrome (International Diabetes Federation:IDF)
*Variable importance is defined as the variable importance when the top variable is set as 100%.