2004 Volume 24 Issue 1 Pages 35-44
Objectives: To study the ability of text mining technique for the selection of specific words related to diagnosis and to distinguish the diseases of discharge summaries. Materials and methods: 4,317 discharge summaries in Chiba University Hospital were selected out of 13 representative diseases. Diagnosis related terminological words were extracted by morphological analysis. Thus, the diseases were compared with each other using tf×idf vector space model and important specific words for each disease were selected. Furthermore, we applied the vector space model for new cases and indicated the vector by a radar chart. Results: 7,918 words were selected from cases and 74% of 390 cases were properly diagnosed. The maximum-tree problem and dendrogram method demonstrated reasonable relationships among 13 diseases. Conclusion: These results suggest the possibility that text-mining technique is applicable to the automotive classification of medical documents according to the diagnoses.