抄録
Data mining has been used in many fields, and recently these methods have drawn attention in the medical field as well. We compared 3 statistical methods with conventional linear discrimination analysis by hit ratio (prediction accuracy) to evaluate the best method in arteriosclerosis prediction. We enrolled 502 healthy people and diagnosed arteriosclerosis by measuring pulse wave velocity (baPWV), and constructed 4 models (linear discrimination, logistic regression, neural network and decision tree) to predict it. We compared precision once for internal check, and repeated the experiments 12 times for external check. The decision tree model had the best precision by both internal check (93.6%) and external check (83.2±2.9%). Though the linear discrimination model (79.8±9.3%) was inferior to the decision tree model (p value=0.002), it had a significantly higher hit ratio than the logistic regression model (75.4±2.7%) and the neural network (74.8±4.4%). The decision tree model showed the smallest standard deviation and the highest hit ratio in binary data prediction of arteriosclerosis. By using these techniques of multivariate analysis, in coordination with the electronic medical chart system, the automatic diagnostic system has the possibility of reducing human errors, such as misdiagnosis and underdiagnosis. This study indicated the decision tree (C 5.0 algorithm) model is one of the candidates to preliminary diagnose multifactorial diseases such as atherosclerosis.