目的志向探索と段階的目的水準制御

石倉 圭悟; 久米 淳; 高橋 達二; 甲野 佑

doi:10.11517/pjsai.JSAI2023.0_3R5GS203

Abstract

Humans tend to attempt to achieve higher goals by gradually updating the objective level.Also ,Trial and error to achieve the goal is very quick.These allow for efficient, step-by-step optimization of procedures. The latter trial-and-error capability is supported by the Risk-sensitive Satisficing (RS) algorithm in the context of reinforcement learning. On the other hand, there is a lack of discussion on the step-by-step updating of the objective level in a framework that combines the former with the latter. The advantage of having an objective is that prior knowledge can be used to set the objective. In the case of animals, it corresponds to the search for food using calorie consumption as a minimum criterion, and in industrial applications, it corresponds to operational costs and numerical targets for investors. If the goal is achieved, the agent adjusts the target upward, and if it is unattainable, it adjusts the target downward. It is also very flexible, as it can change its goals based on hearsay information, such as when another agent has achieved a better performance record. In this study, we examine the joint goal search, RS, and the gradual modification of the goal level through simulations of the Bandit problem. We propose a natural form that efficiently optimizes behavior by having an initial objective level corresponding to a prior distribution based on prior knowledge and body structure.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!