バイオメディカル・ファジィ・システム学会大会講演論文集
Online ISSN : 2424-2586
Print ISSN : 1345-1510
ISSN-L : 1345-1510
会議情報

乳癌データを用いたデータマイニング手法の比較(一般講演2B)
遠藤 有人柴田 健雄田中 博
著者情報
会議録・要旨集 フリー

p. 71-74

詳細
抄録

Objective : Today in the United States, about one in eight women over their lifetime has affected by breast cancer. In recent years the incidence rate keep increasing and data shows that the five years survival rate is 88% and the 10 years survival rate is 80%. Recently in Japan, the incidence rate of breast cancer has been increasing. After 1995, breast cancer ranks first incidence in female cancers and receives a great deal of public attention. Up to today, various prediction models with using SEER (Surveillance Epidemiology and End Results) datasets have been constructed in past studies. However the appropriate methods for predicting the breast cancer have not established. In this paper, we present optimal models to predict the survival rate of breast cancer patients in five years. Material and Methods' This study used the 37,356 follow-up patients by 2002 that were diagnosed as breast cancer and registered in the SEER program from 1992 to 1997. We used seven common data mining algorithms (Logistic Regression model, Artificial Neural Network (Multilayer Perceptron), Naive Bayes, Bayes Net, Decision Trees with naive Bayes, Decision Trees (ID3) and Decision Trees (J48), besides a most generally used statistical method (Logistic Regression model) to develop the prediction models. We also used 10-fold cross-validation methods to assess the unbiased estimate of the five prediction models for comparison of performance of each method. Results : The accuracy was 85.8±0.2%,84.5±1.4%,83.9±0.2%,83.9±0.2%,84.2±0.2%,82.3±0.2%,85.6±0.2% for the Logistic Regression model, Artificial Neural Network(Multilayer Perceptron), Naive Bayes, Bayes Net, Decision Trees with naive Bayes, Decision Trees(ID3) and Decision Trees(J48), respectively. Conclusion : In this study, Logistic Regression model showed the highest accuracy. The Decision Trees (J48) had the highest sensitivity and the Artificial Neural Network had the highest specificity. The Decision Trees models tend to show high sensitivity. And the Bayesian models were apt to show the accuracy goes up. We found that the optimal algorithm might be different by the predicted objects and datasets.

著者関連情報
© 2007 バイオメディカル・ファジィ・システム学会
前の記事 次の記事
feedback
Top