部分観測環境での強化学習へのモデルベースアプローチ : 可変長記憶モデルのベイズ学習

末松 伸朗; 林 朗; 李 仕剛

doi:10.11517/jjsai.13.3_404

抄録

Most of the reinforcement learning (RL) algorithms assume that the learning processes of embedded agents can be formulated as Markov Decision Processes (MDPs). However, the assumption is not valid for many realistic problems. Therefore, research on RL techniques for non-Markovian environments is gaining more attention recently. We have developed a Bayesian approach to RL in non-Markovian environments, in which the environment is modeled as a history tree model, a stochastic model with variable memory length. In our approach, given a class of history trees, the agent explores the environment and learns the maximum a posteriori (MAP) model on the basis of Bayesian Statistics. The optimal policy can be computed by Dynamic Programming, after the agent has learned the environment model. Unlike many other model learning techniques, our approach does not suffer from the problems of noise and overfitting, thanks to the Bayesian framework. We have analyzed the asymptotic behavior of the proposed algorithm and have proved that if the given class contains the exact model of the environment, the model learned by our algorithm converges to it. We also present the results of our experiments in two non-Markovian environments.

著者関連情報

お気に入り & アラート

閲覧履歴

発行機関からのお知らせ

PDF閲覧時に認証を求められる記事がございます（発行後2年間）が，人工知能学会の個人会員は無料で閲覧可能です．認証のための購読者番号やパスワードは会員マイページ（ユース会員の場合はジュニア・ユース会員サイト）にログインし「お知らせ」にてご確認下さい（会員情報管理システムとオンラインで連携していないため，パスワードは同システムとは異なります．また，認証情報の更新は偶数月の月末に実施しております．新規入会された方は利用できるまでしばらくお待ちください）．個人会員以外は記事複製申込フォームから購入いただけます．また，アマゾンにて冊子版あるいはKindle版を購入いただくことも可能です．

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）