Abstract
For judging the convergence property of reinforcement learning algorithms, we formulate the learning scheme in terms of a discrete Markov process, and transform its equation into a continuous time master equation. By making a small perturbation for as mall learning parameter, we derive a small perturbation expansion of the master equation to get a Fokker-Planck equation approximation with the low-order of the learning parameters. In here, we show that the global features of reinforcement scheme of learning automata can be described within this approximation due to the fact that the deterministic term of the dynamics has a globally asymptotically stable fixed point.