In this paper, we discuss problems of discriminant analysis by mark sense test data. The test consists of 100 questions with 10 choices. The correct or incorrect answers are converted to 1/0 values. Therefore, this data is the discrimination of two groups (pass and fail) with 100 independent variables
xi. And 100 questions are summarized in six or nine sub-total scores.
Two groups are trivial linear separable data. Linear discriminant function such as
y=ƒ(x) = Score (∑
ixi) - pass/fail score. If
y ≥ 0, students pass the examination. Otherwise, students don't pass. Therefore, the number of misclassification by this linear discriminant function is 0.
Fisher's linear discriminant function (LDF), quadratic discriminant function and logistic regression are compared are with optimal linear discriminant function (Revised IP-OLDF) based on MNM (Minimum number of misclassifications) criterion by these data.
In the cases of 100 independent variables discrimination, the following problems are found. The stepwise variable selection methods chose over 28 independent variables, nevertheless Revised IP-OLDF find that these data is linear separable less than 12 independent variables. In some cases, quadratic discriminant function misclassified all pass/fail students to other group. The standard error of coefficients of logistic regression becomes very big.
In the cases of summarized sub-total scores discrimination, the number of misclassifications of LDF, quadratic discriminant function are mostly greater than 0, nevertheless MNM of Revised IP-OLDF and 0.
View full abstract