A Markovian decision process with estimation of unknown transition probabilities is used as an important learning control model of stochastic systems in a wide range of applications. Many studies have been devoted and various schemes have been presented for the Markovian decision problem. Most of them stand on the assumption that the process is stationary, in other words, the transition probabilities are constant irrespective of time. In practice, however, they may not be generally constant. Accordingly, it is practically significant to consider the problem in the case of nonstationary processes. Particularly, we are interested in the case of cyclic processes. This is because many real systems may be affected by external conditions generated from cyclic natural phenomena and/or habitual human behavior.
In view of these, we present a scheme of estimation and control for the problem on the assumption that the unknown probabilities are dominated by a parameter which changes its value with cycle
T. Every
T time instants, under this scheme, we make estimation of the parameter and then determination of control actions to be chosen for the next
T instants. The basic idea of designing this scheme is to replace the cyclic process with a stationary process by regarding
T instants as a unit time interval and introducing an “extended parameter”. It is proved that the maximum likelihood estimate of the extended parameter converges to the true value and consequently this scheme asymptotically attains control which is optimal in a sense.
抄録全体を表示