Journal of the Japanese Society of Computational Statistics
Online ISSN : 1881-1337
Print ISSN : 0915-2350
ISSN-L : 0915-2350
SELECTION AND ORDERING OF VARIABLES IN DISCRIMINANT ANALYSIS AND APPLICATION TO BEHRENS-FISHER MODEL
Toshie Yamashita
Author information
JOURNAL FREE ACCESS

1999 Volume 12 Issue 1 Pages 15-39

Details
Abstract
In this paper, we discuss selection and ordering of variables in a discriminant analysis between two populations based on the minimization of risk. In selection of variables, under certain assumptions, we prove that the risk of discrimination being minimum for any cut-off point is a necessary and sufficient condition for the Matusita distance between two populations being maximum. In other words, under the assumptions, we “connect” a classification of an observation into one of two populations (by risk of discrimination) and separation of two populations (by Matusita distance). Then, the best selection from the full set of p variables used in the discrimination is the selection of variables that maximizes the Matusita distance between two populations. Together with this, we consider the most important variable for discrimination as the one that, when it is deleted, minimizes the Matusita distance between two populations (using p-1 variables). Under the assumptions, these methods minimize the risk of discrimination. When we have samples from two populations, we may construct appropriate hypotheses with each hypothesis expressing the Matusita distance between two populations using a particular subset of variables is maximum, and we select the best hypothesis. In order to solve the problem of multiple testing, we select the hypothesis or model (expressing the Matusita distance) that makes the best prediction or discrimination for future observations (model selection). For each hypothesis, we may obtain AIC and we select the hypothesis that minimizes the AIC. Together with this, we also investigate the ordering of variables using samples. In practice, the ordering of variables becomes important when the number of hypotheses for the selection is too large (2p-1), while the number of hypotheses for the ordering is p (assuming large sample sizes). As an application, we have chosen the Behrens-Fisher problem. To evaluate whether the selected hypothesis maximizes the Matusita distance or the ordered hypotheses order the variables with respect to the importance of discrimination, we make Monte-Carlo simulation.
Content from these authors
© The Japanese Society of Computational Statistics
Previous article Next article
feedback
Top