Improvement of Representativeness of Training Data by EM Algorithm

Y. Iikura; Y. Yasuoka

doi:10.11440/rssj1981.9.4_341

抄録

For supervised classification of multispectral images, it is of primary importance to select an appropriate training data set for the categories to be classified. However, as the selection of the training data set is not based on statistical procedures such as random sampling, the estimated distribution parameters for each categories often show biased properties.
This paper discusses the correction of the biased estimates for the training data set by the EM algorithm, which is an iterative procedure for obtaining the maximum-likelihood estimates in incomplete data problems. For this purpose, the correction of biased estimates is mathematically formulated as the mixture density problem, where training data and non training data is regarded as the complete data and incomplete data, respectively. In the iterative procedure, the incomplete data are regarded as the pseudo-complete data having the posterior probability (E step), which in turn is utilized to estimate the distribution parameters according to the maximum likelihood method (M step).
It is found that the application of the EM algorithm to the multispectral images gives rise to following two problems: (1) inefficiency of the algorithm becomes significant if all pixels in the image are used as the incomplete data, and (2) the results depend on the number of training data selected.
The algorithm is modified by introducing the reliability index of the training data to give the stable estimates even if the small number of pixels are used. It is shown that the modified algorithm is successfully applied to the classification of Landsat TM data without losing the good properties of the original EM algorithm.

著者関連情報

お気に入り & アラート

閲覧履歴

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）