Sparse estimation is now the standard method for regression analysis when there are many candidates for explanatory variables. On the other hand, for the selection of regularization parameter in sparse estimation, even if AIC for the sparse estimation is obtained in a simple form and the goal is to make a good prediction, the AIC is not always used, i.e., the standard method does not seem to be fixed. In this paper, we conduct numerical experiments to evaluate the performance of models estimated using a combination of LASSO and AIC in such a case, precisely the case where LASSO is used in normal linear regression analysis. Specifically, we first compare it with the combination of the ridge regularization and AIC, the best subset regression using the maximum likelihood method and ordinary AIC, or the combination of LASSO and the cross-validation, by evaluating the prediction squared error. In the cross-validation method, we will also check how much the estimation results differ depending on how the data is partitioned, by numerically evaluating the variability of the prediction squared error or the degrees of freedom of the selected model. The AIC for LASSO is derived from the SURE theory, but since it does not seem to be well known, it will be derived in a slightly generalized setting at the end.
View full abstract