主催: 日本ヒトプロテオーム機構
Alzheimer's disease (AD) is the most common form of dementia and leads to irreversible neurogenerative damage of the brain. AD affects nearly 10% of the population after 65 years of age. Although the progression of AD is slow and it takes several years from onset of cognitive decline to diagnosis, the current diagnostic tools have poor sensitivity, especially for the early stages of AD and do not allow for diagnosis until AD has lead to irreversible brain damage. Therefore, it is crucial that AD is detected as early as possible. Since it is very hard, laborious and time-consuming to gather many AD and non-AD samples, it is very desirable to develop a predictive learning method to exhibit high performance using both training samples and test samples. To address this problem, we propose semi-supervised distance metric learning using Random Forests with label propagation (SRF-LP), which incorporates labeled data for obtaining good metrics and propagates labels based on them. We applied our proposed method, SRF-LP, to cytokine antibody arrays datasets, which was produced from plasma samples for Alzheimer's classification and diagnosis. The datasets consist of 83 training set and 92 test set. Training set consists of 43 AD patients and 40 non-AD individuals, and test set consists of 42 AD patients and 50 non-AD individuals. Experimental results showed that SRF-LP outperformed standard supervised learning algorithms, i.e., RF, SVM, Adaboost and CART, and reached 93.1% accuracy at a maximum. Especially, SRF-LP largely outperformed when the number of training samples is very small. Thus, we demonstrated that labeled data should be incorporated for distance metric and learned metrics are appropriate for label propagation. Moreover, we showed that SRF-LP is able to reduce the number of training samples by about one-half to achieve the comparable accuracy (89%) of the original classifier, NSC (Nearest Shrunken Centroid).