人工知能
Online ISSN : 2435-8614
Print ISSN : 2188-2266
人工知能学会誌(1986~2013, Print ISSN:0912-8085)
満足化原理に基づく強化学習のための確率的探査戦略
片山 晋武市 正人小林 重信
著者情報
解説誌・一般情報誌 フリー

1998 年 13 巻 6 号 p. 971-980

詳細
抄録

Reinforcement learning (RL) is the class of learning to obtain a policy to interact with the environment among an autonomous agent, only with the clue of the signal which tells the agent whether former interactions were adequate or not. Most RL algorithms are directed to obtain an optimal controller, which specification is unreasonable and often in vain because of the contradiction between exploration and exploitation. This paper proposes a new framework of RL, satisficing RL, shows that directing to satisfice is a reasonable specification free from contradictions, and also presents an RL system, which is mathematically assured to satisfice only with next to the least constraints. An example presented will help us to grasp the idea of satisficing RL. The assurance of satisficing is described as a convergence theorem. Other features of the RL system are also described, while convergence rate estimation is reserved as a future work. Since we know the real world includes a great amount of states, in discussing the real problems we should assume the state set to be infinite. On the other hand, work memories are necessary for the agent to be intelligent, which are made to contain the information about the environment. For this reason, this paper also proposes the way to satisfice in the environment with perceptual aliasing, using finite memories efficiently.

著者関連情報
© 1998 人工知能学会
前の記事 次の記事
feedback
Top