人工知能
Online ISSN : 2435-8614
Print ISSN : 2188-2266
人工知能学会誌(1986~2013, Print ISSN:0912-8085)
部分観測環境での強化学習へのモデルベースアプローチ : 可変長記憶モデルのベイズ学習
末松 伸朗林 朗李 仕剛
著者情報
解説誌・一般情報誌 フリー

1998 年 13 巻 3 号 p. 404-414

詳細
抄録

Most of the reinforcement learning (RL) algorithms assume that the learning processes of embedded agents can be formulated as Markov Decision Processes (MDPs). However, the assumption is not valid for many realistic problems. Therefore, research on RL techniques for non-Markovian environments is gaining more attention recently. We have developed a Bayesian approach to RL in non-Markovian environments, in which the environment is modeled as a history tree model, a stochastic model with variable memory length. In our approach, given a class of history trees, the agent explores the environment and learns the maximum a posteriori (MAP) model on the basis of Bayesian Statistics. The optimal policy can be computed by Dynamic Programming, after the agent has learned the environment model. Unlike many other model learning techniques, our approach does not suffer from the problems of noise and overfitting, thanks to the Bayesian framework. We have analyzed the asymptotic behavior of the proposed algorithm and have proved that if the given class contains the exact model of the environment, the model learned by our algorithm converges to it. We also present the results of our experiments in two non-Markovian environments.

著者関連情報
© 1998 人工知能学会
前の記事 次の記事
feedback
Top