抄録
A new class of Expectation and Maximization algorithm is presented and applied to probabilistic learning. This algorithm can be derived from the non-negativity of the α-divergence and Bayesian computation. The design parameter α specifies a prior probability weight for the learning. Accordingly, this algorithm is called the α-Weighted EM algorithm (WEM, α-EM). The traditional EM algorithm (log-EM) corresponds to special case of α = -1. Besides the WEM, a practically more useful version, the W-GEM (Weighted GEM), which gradually updates parameters is presented. Then, this algorithm is applied to learning in mixture-of-probability neural networks. In the discussion of update equations, extensions of basic statistical theories such as Fisher's efficient score, his information matrix and Cramér-Rao's inequality are given. Therein, a concept of an aptitude number is defined in terms of the number α. Numerical experiments on such networks show that the WEM (G-WEM) outperforms the log-EM on the learning speed for appropriate aptitude numbers.
This paper unveils another new method. It is found that each WEM structure can be used as a building block to generate a learning systolic array. Thus, it becomes possible to create functionally distributed systems by appending such blocks as monitors.