In this paper, we review the maximum likelihood method for estimating the statistical parameters which specify a probabilistic model and show that it generally gives an optimal estimator with minimum mean square error asymptotically. Thus, for most applications in information sciences, the maximum likelihood estimation suffices. Fisher information matrix, which defines the orthogonality between parameters in a probabilistic model, naturally arises from the maximum likelihood estimation. As the inverse of the Fisher information matrix gives the covariance matrix for the estimation errors of the parameters, the orthogonalization of the parameters guarantees that the estimates of the parameters distribute independently from each other. The theory of information geometry provides procedures to diagonalize parameters globally or at all parameter values at least for the exponential and mixture families of distributions. The global orthogonalization gives a simplified and better view for statistical inference and, for example, makes it possible to perform a statistical test for each unknown parameter separately. Therefore, for practical applications, a good start is to examine if the probabilistic model under study belongs to these families.
2011 by the Graduate School of Information Sciences (GSIS), Tohoku University