認知的満足化価値関数の分析 保証付き満足化と有限 regret

玉造 晃弘; 高橋 達二

doi:10.11517/pjsai.JSAI2018.0_1N104

Abstract

As the domains of reinforcement learning become more complicated and realistic, standard optimization algorithms may not work well. In this paper we introduce a simple mathematical model called RS (reference satisficing) that implements a satisficing strategy that look for actions with values above the aspiration level. We apply it to K-armed bandit problems. If there are actions with values above the aspiration level, we theoretically show that RS is guaranteed to find these actions. Also, if the aspiration level is set to an ''optimal level'' so that satisficing practically ends up optimizing, we prove that the regret (the expected loss) is upper bounded by a finite value. We confirm these results by simulations, and clarify the effectiveness of RS through comparison with other algorithms.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!