Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
37th (2023)
Session ID : 3R5-GS-2-03
Conference information

Purposive Exploration and Progressive Reference Control
*Keigo ISHIKURAJun KUMETatuji TAKAHASHIYu KONO
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

Humans tend to attempt to achieve higher goals by gradually updating the objective level.Also ,Trial and error to achieve the goal is very quick.These allow for efficient, step-by-step optimization of procedures. The latter trial-and-error capability is supported by the Risk-sensitive Satisficing (RS) algorithm in the context of reinforcement learning. On the other hand, there is a lack of discussion on the step-by-step updating of the objective level in a framework that combines the former with the latter. The advantage of having an objective is that prior knowledge can be used to set the objective. In the case of animals, it corresponds to the search for food using calorie consumption as a minimum criterion, and in industrial applications, it corresponds to operational costs and numerical targets for investors. If the goal is achieved, the agent adjusts the target upward, and if it is unattainable, it adjusts the target downward. It is also very flexible, as it can change its goals based on hearsay information, such as when another agent has achieved a better performance record. In this study, we examine the joint goal search, RS, and the gradual modification of the goal level through simulations of the Bandit problem. We propose a natural form that efficiently optimizes behavior by having an initial objective level corresponding to a prior distribution based on prior knowledge and body structure.

Content from these authors
© 2023 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top