自然強化学習における動的な目的水準の調整

海老原 永輝; 高橋 達二; 甲野 佑

doi:10.11517/pjsai.JSAI2023.0_4E2GS202

Abstract

When humans engage in an unknown reinforcement learning task, they usually search quickly to achieve a certain level of performance and terminate the search when that level is achieved. This property has led to the proposal of the search method Risk-sensitive Satisficing (RS) in previous studies. We have shown that RS is more efficient in trial-and-error and performs as good as or better than conventional methods that aim for optimization. RS has been extended to learning in state transitions by combining it with Global Reference Conversion (RS+GRC), a global reference conversion method that can convert the entire rarefaction level into the rarefaction level of each state and give it to the user. However, while the current RS+GRC performs well under the condition that the optimal rarefaction level is given, the method for proactively adjusting the rarefaction level has not been discussed in depth. In this study, we propose a dynamic, stepwise goal modification algorithm for reinforcement learning based on goal attainment, aiming to deal with tasks in which the scale of the reward function and the level of task attainment are unknown.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!