Abstract
Michiels et al. (2005) showed that a list of genes identified as predictors of prognosis via a non-repeated training — validation approach is unstable and advocate the validation by repeated random sampling. They considered that the genes which were selected as top 50 genes in more than half of their jackknife samples were stable for prediction. However, there is no rationale of the determination of the length of the gene list and the threshold of stability. Since evaluating an accumulation of low p-values in the repeated random sampling is essentially required for a stability assessment, it is better to compare the distribution of p-values of a gene observed with the distribution of p-values under the null hypothesis directly. In this study, the Quantile-Quantile plot (Q-Q plot) of p-values with null reference was proposed for this purpose. We applied the proposed method to a clinical data for primary breast cancer. The Q-Q plot approach can reveal that the genes with a similar p-value in the ordinary analysis have different p-value distributions in the repeated random sampling, and the gene with low p-values accumulated in the repeated random sampling could be evaluated according to the reference lines in the Q-Q plot.