抄録
This paper is concerned with a system which observes periodically one of a number of possible states with R values. After each observation, one of a possible number of actions with A decisions is made.
First, the discussion of such Markov processes is made in section 3·1 and then the work of R.A. Howard is extended. Next, the system relations considered are assumed to depend on a random vector, which forms an m-order homogeneous Markov chain of unknown transition probabilities. Namely, let {Vt}, t=0, 1, 2, … be a sequence on events which is selected as a random vector, Vt=(Ut, Yt) in this paper, and suppose Pr(Yt+1=yj|V0, V1, …, Vt, Ut+1=uk)=Pr(Yt+1=yj|Vt-m+1, …, Vt, Ut+1=uk) where {Ut}, t=0, 1, 2, …is a sequence of control actions and {Yt}, t=0, 1, 2, …is a sequence of measuremets. An algorithm is formulated and used in the design of a controller without an identification of the transition probabilities. Finally, the case that the order of the Markov chain is unknown is discussed in section 4.