Abstract
This paper tries to explain a mathematical foundation of neural network learning. Hierarchical learning machines such as neural networks and gaussian mixtures are non-identifiable learning machines, resulting that the conventional statistical asymptotic theory can not be applied. For such learning machines, the Bayes estimation is more appropriate than the maximum likelihood method. We show the reason why neural networks in Bayesian estimation is useful in practical applications.