Abstract
In this paper, we consider the extension of the β-type learning automata to achieve conditionally optimal performance in a stationary random environment with some response characteristics of P- and Q-models.
The learning scheme of β-type automata is based on the finite models of the probability distributions which characterize the environment, as in the Bayesian learning. In this learning scheme, we have proposed a new class of finite models, by extending the probability distributions to the density functions which belong to the exponential family, so that the automata will act in S-model environment. Moreover, a new output function is defined to determine the rule for selecting an action of the automata. We have shown the conditional optimality of the new β-type automata, by using the martingale convergence theorem. Also, some useful properties of the β-type automata are discussed with simulation tests. It is shown that the β-type automata have a good performance on the learning curves which indicate the nature of evolution of the action probability that corresponds to the optimal action.