In this paper, we propose a new clustering method based on the concept of maximum likelihood (ML) estimation. In general, the problem of local minima arises when we try to use the ML method in clustering problems. Our method circumvents this problem by employing the so called simulated annealing technique. In section 2, we formulate our clustering problem using the ML concept, and derive the ML estimation method. In section 3, validity of the derived method is confirmed by analyzing two artificial data and the famous Iris data. In the final section, our method is also extended from the viewpoint of sequential estimation.
This paper compares an association model for trivariate contingency tables with the trivariate normal distribution. First, similarity between the two models is discussed. Second, it is proved that the association model approximates the discretized trivariate normal distribution. An aritificial illustration shows the degrees of the approximations in small tables. Third, estimation of the correlation coefficients of the underlying trivariate normal distribution is discussed. The present approach is illustrated by use of numerical examples.
An item response model, similar to that in test theory, was proposed for multiplechoice questionaire data. In this model both subjects and item categories are represented as points in a multidimensional euclidean space. The probability of a particular subject choosing a particular item category is stated as a decreasing function of the distance between the subject point and the item category point. The subject point is assumed to follow a certain distribution, and is then integrated out to derive marginal probabilities of response patterns. A marginal maximum likelihood (MML) method was developed to estimate coordinates of the item category points as well as distributional properties of the subject point. Bock and Aitkin's EM algorithm was adapted to the MML estimation of the proposed model. Examples were given to illustrate the method, which we call MAXMC.
The asymmetric cluster analysis was applied to the enrollement flow from high school to university among the Japanese prefectures to disclose and to compare university enrollment regions before and after the introduction of the Joint First Stage Achievement Test (JFSAT). The asymmetric cluster analysis used in the present study is characterized by two aspects. One is that it is based on the mean clustering so that small university enrollment regions can be disclosed as well. The other is that it allows the clustering based on self similarity. Eight university enrollment regions were disclosed before and after the JFSAT respectively, and seven of the eight were centered at same prefectures before and after the JFSAT, suggesting that university enrollment regions were almost unchanged. But there were differences in some respects. (a) After the JFSAT, university enrollment regions which were geographically not between the two largest university enrollment regions centered at Tokyo and Kyoto (north to the one centered at Tokyo or west to the one centered at Kyoto) became larger and more independent from the one centered at Tokyo. (b) After the JFSAT, university enrollment regions between the two largest enrollment regions did not become larger nor independent.
When we select N0, students from N applicants on the basis of a composite score of subtests, it is important to evaluate the contribution of each subtest. The swap-rate, which is defined as the proportion of the applicants who actually pass the examination but would fail if the j-th subtest were not included in the component to rank the applicants, is one of the measures of the contribution of the j-th subtest. In this article, first, we derive the characteristics and limiting properties of the population swap-rate. Next, using the properties of the order statistics and the extended hypergeometric distribution, we derive an approximation to the asymptotic variance of the sample swap-rate when the number of applicants is large. Finally, we propose the use of our analytic approximation to the variance of the sample swap-rate in the real data problem and show that it is very efficient.