Host: Japan Society for Fuzzy Theory and Intelligent Informatics (SOFT)
In this paper, we discuss c-means clustering algorithms on the multinomial manifold. Data forms a Riemannian manifold with the Fisher information metric via the probabilistic mapping from datum to a probability distribution. For discrete data, the statistical manifold of the multinomial distribution is appropriate. In general, The euclidean distance is not appropriate on the manifold because the parameter space of the distribution is not flat. We apply the Kullback-Leibler (KL) divergence or the Hellinger distance as approximations of the geodesic distance to hard c-means and fuzzy c-means.