Environmental Health and Preventive Medicine
Online ISSN : 1347-4715
Print ISSN : 1342-078X
ISSN-L : 1342-078X
Application of machine learning algorithms in predicting new onset hypertension: a study based on the China Health and Nutrition Survey
Manhui ZhangXian XiaQiqi WangYue PanGuanyi ZhangZhigang Wang
Author information
JOURNAL OPEN ACCESS FULL-TEXT HTML

2025 Volume 30 Pages 3

Details
Abstract

Background: Hypertension is a serious chronic disease that can significantly lead to various cardiovascular diseases, affecting vital organs such as the heart, brain, and kidneys. Our goal is to predict the risk of new onset hypertension using machine learning algorithms and identify the characteristics of patients with new onset hypertension.

Methods: We analyzed data from the 2011 China Health and Nutrition Survey cohort of individuals who were not hypertensive at baseline and had follow-up results available for prediction by 2015. We tested and evaluated the performance of four traditional machine learning algorithms commonly used in epidemiological studies: Logistic Regression, Support Vector Machine, XGBoost, LightGBM, and two deep learning algorithms: TabNet and AMFormer model. We modeled using 16 and 29 features, respectively. SHAP values were applied to select key features associated with new onset hypertension.

Results: A total of 4,982 participants were included in the analysis, of whom 1,017 developed hypertension during the 4-year follow-up. Among the 16-feature models, Logistic Regression had the highest AUC of 0.784(0.775∼0.806). In the 29-feature prediction models, AMFormer performed the best with an AUC of 0.802(0.795∼0.820), and also scored the highest in MCC (0.417, 95%CI: 0.400∼0.434) and F1 (0.503, 95%CI: 0.484∼0.505) metrics, demonstrating superior overall performance compared to the other models. Additionally, key features selected based on the AMFormer, such as age, province, waist circumference, urban or rural location, education level, employment status, weight, WHR, and BMI, played significant roles.

Conclusion: We used the AMFormer model for the first time in predicting new onset hypertension and achieved the best results among the six algorithms tested. Key features associated with new onset hypertension can be determined through this algorithm. The practice of machine learning algorithms can further enhance the predictive efficacy of diseases and identify risk factors for diseases.

Content from these authors

This article cannot obtain the latest cited-by information.

© The Author(s) 2025.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
http://creativecommons.org/licenses/by/4.0/
Previous article Next article
feedback
Top