認知的満足化価値関数の関数近似

吉井 佑輝; 甲野 佑; 高橋 達二

doi:10.11517/pjsai.JSAI2020.0_2I5GS202

Abstract

Humans have a tendency in decision-making called satisficing: they stop exploring more when they find an option above a criterion (aspiration level). Risk-sensitive Satisficing (RS) model is a value function that enables efficient non-random exploration and realizes satisficing in reinforcement learning (Tamatsukuri & Takahashi, 2019). To apply RS to continuous state spaces, we extended RS to Linear RS (LinRS) for function approximation and test its performance in the contextual bandit problems. As a result, it was found that the algorithm had better performance in probabilistic environments than the existing algorithms. Also, it was found that the aspiration level needed to be corrected because of the approximation error.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!