内在的強化学習の理論 報酬に質を導入する

高橋 達二

doi:10.11517/pjsai.JSAI2023.0_2Q1OS27a01

Abstract

The theory of reinforcement learning (RL) is organized around the optimality principle of maximizing rewards, based on Markov decision processes, dynamic programming, and Monte Carlo methods. In this paper, we propose a simple modification to the theoretical framework of RL onto rewards. We introduce some ``quality'' into rewards that are usually considered only quantitative (with the total order relation). The introduction is represented by transformation of rewards with an aspiration level (zero-level) that is a certain threshold for values. This little modification leads to another definition of subjective (or internal) regret, then an existing model of Risk-sensitive Satisficing (RS), representation of the rational risk attitudes and the qualitatively high performance of bounded (objective) regret. We review the theoretical framework of our natural (internal) RL, compare it with the artificial (external) RL and Simon's paradigm of bounded rationality and satisficing, answer frequently asked questions, and where we go from here. Specifically, the possibilities of modeling societies and economies are discussed.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!