ESTIMATION OF PROBABILITIES AFTER CLUSTERING

Kai Fun Yu

doi:10.11329/jjss1970.23.171

Abstract

Asample is taken from a mixture of two subpopulations. The characteristics of the two sub-populations are to be estimated. If each observation in the sample is completely identiFIed, that is, if one knows which observation comes from which sub-population, then the estimation can follow standard methods and it is straightforward. However, if the observations are not identified, then some clustering Procedure has to beapplied to classify the observations into two sub-populatiolls. A reasonable estimate turns out to be an inconsistent estimate as long as there is a chance of misclassification. This note introduces a general method of estimation after clustering. This estimation procedurc subsumes the reasonablc method when the identities of the observations are known.
A concept of fuzzy partition is employed here to by-pass the problem of misclassi-fication. Two examples will be discussed. One example will involve a parametric classification procedure and the other will involve a nonparametric clustering procedure called K-means. A Monte Carlo study will be conducted to compare the estimates arising from a classical clustering procedure and a fuzzy clustering procedure.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!