満足化原理に基づく強化学習のための確率的探査戦略

片山 晋; 武市 正人; 小林 重信

doi:10.11517/jjsai.13.6_971

抄録

Reinforcement learning (RL) is the class of learning to obtain a policy to interact with the environment among an autonomous agent, only with the clue of the signal which tells the agent whether former interactions were adequate or not. Most RL algorithms are directed to obtain an optimal controller, which specification is unreasonable and often in vain because of the contradiction between exploration and exploitation. This paper proposes a new framework of RL, satisficing RL, shows that directing to satisfice is a reasonable specification free from contradictions, and also presents an RL system, which is mathematically assured to satisfice only with next to the least constraints. An example presented will help us to grasp the idea of satisficing RL. The assurance of satisficing is described as a convergence theorem. Other features of the RL system are also described, while convergence rate estimation is reserved as a future work. Since we know the real world includes a great amount of states, in discussing the real problems we should assume the state set to be infinite. On the other hand, work memories are necessary for the agent to be intelligent, which are made to contain the information about the environment. For this reason, this paper also proposes the way to satisfice in the environment with perceptual aliasing, using finite memories efficiently.

著者関連情報

お気に入り & アラート

閲覧履歴

発行機関からのお知らせ

PDF閲覧時に認証を求められる記事がございます（発行後2年間）が，人工知能学会の個人会員は無料で閲覧可能です．認証のための購読者番号やパスワードは会員マイページ（ユース会員の場合はジュニア・ユース会員サイト）にログインし「お知らせ」にてご確認下さい（会員情報管理システムとオンラインで連携していないため，パスワードは同システムとは異なります．また，認証情報の更新は偶数月の月末に実施しております．新規入会された方は利用できるまでしばらくお待ちください）．個人会員以外は記事複製申込フォームから購入いただけます．また，アマゾンにて冊子版あるいはKindle版を購入いただくことも可能です．

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）