BioScience Trends
Online ISSN : 1881-7823
Print ISSN : 1881-7815
ISSN-L : 1881-7815
Original Article
Predicting non-alcoholic fatty liver disease (NAFLD) using machine learning algorithms: Evidence from a large-scale community cohort in Taiwan
Tzu-Chun LinYu-Ju WeiPo-Cheng LiangPei-Chien TsaiYi-Hung LinMeng-Hsuan HsiehTyng-Yuan JangChih-Wen WangMing-Yen HsiehZu-Yau LinMing-Lun YehJee-Fu HuangChung-Feng HuangWan-Long ChuangMing-Lung YuChia-Yen DaiHon-Yi Shi
Author information
JOURNAL FREE ACCESS

2026 Volume 20 Issue 1 Pages 80-90

Details
Abstract

Closely associated with metabolic disorders, non-alcoholic fatty liver disease (NAFLD) substantially increases the risk of hepatocellular carcinoma. This study aimed to apply machine learning (ML) algorithms to a community-based cohort in southern Taiwan to identify key risk factors for NAFLD and to develop predictive models with clinical applicability. Data were derived from community health examinations, and eighteen clinical and demographic features were analyzed. Five ML algorithms were evaluated: logistic regression (LR), random forest (RF), K-nearest neighbors (KNN), adaptive boosting (AdaBoost), and extreme gradient boosting (XGBoost). Model performance was assessed using accuracy, precision, recall, F1 score, and area under the receiver operating characteristic curve (AUROC). A total of 7,510 participants were included (38.8% male; mean age 50.9 ± 15.0 years). The dataset was randomly divided into training (80%) and testing (20%) subsets, with no significant differences observed between groups in most independent variables. The Synthetic Minority Over-sampling Technique (SMOTE) was employed to balance NAFLD and non-NAFLD groups in the training dataset. Among all models, XGBoost achieved the highest performance, with an accuracy of 83.48%, precision of 84.31%, recall of 81.21%, F1 score of 82.72%, and AUROC of 92.85%. Feature importance analysis identified low-density lipoprotein cholesterol (LDL-C), body mass index (BMI), waist circumference, fasting plasma glucose (FPG), and triglycerides (TG) as the most influential predictors of NAFLD. ML algorithms, particularly XGBoost, demonstrated high accuracy in predicting NAFLD and effectively identified key clinical predictors. These findings may enhance early diagnosis and facilitate the development of targeted intervention strategies in the management of NAFLD.

Content from these authors
© International Research and Cooperation Association for Bio & Socio-Sciences Advancement
Previous article Next article
feedback
Top