計測自動制御学会論文集
Online ISSN : 1883-8189
Print ISSN : 0453-4654
ISSN-L : 0453-4654
確率オートマトンの学習的性能に関する研究
馬場 則夫椹木 義一
著者情報
ジャーナル フリー

1974 年 10 巻 1 号 p. 78-85

詳細
抄録
This paper discusses the learning behaviours of variable-structure Stochastic Automata under stationary random environment.
A new reinforcement scheme (TNP) of the reward-penalty type which can ensure ε-optimality under all stationary random environments is proposed. It is proved, using Semi-Martingale Inequality and complex manipulations, that the TNP scheme can ensure ε-optimality. Moreover two reinforcement schemes (Lr-1 and T1) which have been contrived are discussed from that point of view.
The TNP scheme is superior to Lr-1 and T1 in the following respects ((1), (2)):
(1) The Lr-1 scheme is the reward-inaction type.
(2) The T1 scheme can ensure ε-optimality only under certain conditions.
Computer simulation results also indicate that the TNP scheme accomplishes the most effective learning behaviour.
著者関連情報
© 社団法人 計測自動制御学会
前の記事 次の記事
feedback
Top