目標志向探索における満足化水準の二重強化学習

中村 航; 高橋 達二; 甲野 佑

doi:10.11517/pjsai.JSAI2025.0_3Win507

Abstract

When humans begin a new endeavor, they initially focus on acquiring basic skills and progressively advance to intermediate and advanced levels. In essence, the focus is on achieving a goal rather than optimizing from the outset. Based on this idea, we decompose reinforcement learning into two processes: goal-oriented exploration and stepwise goal adjustment. Our algorithm, Risk-sensitive Satisficing (RS), quickly achieves satisficing by minimizing a subjective regret defined by the goal. RS also dynamically optimizes the goal in bandit problems, matching Thompson Sampling performance without requiring prior knowledge. While this demonstrates the usefulness of decomposing reinforcement learning into two key elements, current RS goal adjustment methods remain limited to bandit problems. In this study, we propose a general goal adjustment algorithm based on reinforcement learning for motor control. By integrating two simple reinforcement learning processes - rapid goal attainment and one-dimensional goal optimization - we successfully operationalize the concept of a goal.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!