Abstract
Identifying the most relevant or interesting units is a common task in large-scale statistical inference. Recently, Henderson & Newton (2016; Journal of the Royal Statistical Society, Series B, 78, 781-804) proposed a new ranking measure named r-value to achieve optimal ranking in Bayesian inference. The r-value depends on the assumed Bayes model and its ranking accuracy can be violated by model misspecification. In medical and biological studies, large-scale candidate variables often consist of a mixture of null (the effect sizes are zero; non-interesting units) and nonnull (the effect sizes are non-zero; interesting units) components, e.g., for genome-wide association studies. In this article, to provide accurate ranking outputs, we propose to apply the Bayesian hierarchical mixture modeling for the ranking and selection inferences. We also propose to use a semiparametric approach using Laird's nonparametric maximum likelihood estimation in empirical Bayes inference. Using the mixture modeling, we can estimate false discovery rate (FDR) for the selected highly ranked units. We assess the effectiveness of the proposed method via an application to a breast cancer clinical study.