論文ID: 2025TAP0023
Model selection is a critical step in data analysis and machine learning, particularly in prediction tasks where the true underlying model is rarely known. Although numerous techniques have been proposed, most traditional methods select the same model regardless of the predictor variables. In practice, however, predictor variables may be fully or partially available at the time of prediction, which is expected to improve predictive accuracy. In this paper, we present a novel model selection framework in which the chosen model varies depending on the values of the predictor variables based on statistical decision theory. We begin by defining a loss function that explicitly incorporates the predictor variables and then derive the corresponding Bayes risk function. Subsequently, we present an expression for model selection that minimizes this Bayes risk. Using the same procedure, we also define a loss function for the scenario in which predictor variables are unavailable and derive an expression that minimizes the Bayes risk in that setting. Building on these formulations, we establish a theorem for a model selection method that minimizes Bayes risk, enabling us to obtain explicit selection criteria under commonly used loss functions, including the logarithmic and squared error losses. Furthermore, by applying this theorem, we demonstrate a connection to the existing Procedure for Optimal Predictive Model Selection (POPMOS). In particular, we show that POPMOS—originally devised to minimize the Kullback—Leibler divergence between each model's predictive distribution and the posterior predictive distribution—arises as a special case of our general Bayes risk minimization framework when the logarithmic loss function is employed. We validate the effectiveness of our approach through extensive simulations on synthetic data, demonstrating that our framework not only reduces prediction error but also compares favorably with current model selection techniques.