満足化原理に基づく強化学習のための確率的探査戦略

片山 晋; 武市 正人; 小林 重信

doi:10.11517/jjsai.13.6_971

Abstract

Reinforcement learning (RL) is the class of learning to obtain a policy to interact with the environment among an autonomous agent, only with the clue of the signal which tells the agent whether former interactions were adequate or not. Most RL algorithms are directed to obtain an optimal controller, which specification is unreasonable and often in vain because of the contradiction between exploration and exploitation. This paper proposes a new framework of RL, satisficing RL, shows that directing to satisfice is a reasonable specification free from contradictions, and also presents an RL system, which is mathematically assured to satisfice only with next to the least constraints. An example presented will help us to grasp the idea of satisficing RL. The assurance of satisficing is described as a convergence theorem. Other features of the RL system are also described, while convergence rate estimation is reserved as a future work. Since we know the real world includes a great amount of states, in discussing the real problems we should assume the state set to be infinite. On the other hand, work memories are necessary for the agent to be intelligent, which are made to contain the information about the environment. For this reason, this paper also proposes the way to satisfice in the environment with perceptual aliasing, using finite memories efficiently.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!