2018 Volume 38 Issue 6 Pages 351-357
Over the last half century, a number of new learning methods have been developed, including SVMs and deep neural networks. These are very accurate, but unfortunately they also lack explainability. In particular, deep neural networks provide no information about the importance of feature variables. High explainability is expected to guarantee the reliability of prediction models made by learning methods other than the evaluation of prediction accuracy. To address this problem, we have developed a factor analysis technique for nonlinear machine learning methods. The technique has two statistical steps as follows. The first step, called backward analysis, generates probability distributions of the positive and negative classes estimated by the prediction model. The second step uses backward elimination based on Hilbert-Schmidt independence criteria to extract feature variables for which there is a nonlinear correlation between the feature variables and outcome. This factor analysis technique was verified by simulation. In the experiment, we extracted new factors that are relevant to prostate cancer from the feature variables of gene expression data. Experimental results show that this technique has the potential to play a vital role in clinical research.