Host: The Japanese Society for Artificial Intelligence
Name : The 37th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 37
Location : [in Japanese]
Date : June 06, 2023 - June 09, 2023
The theory of reinforcement learning (RL) is organized around the optimality principle of maximizing rewards, based on Markov decision processes, dynamic programming, and Monte Carlo methods. In this paper, we propose a simple modification to the theoretical framework of RL onto rewards. We introduce some ``quality'' into rewards that are usually considered only quantitative (with the total order relation). The introduction is represented by transformation of rewards with an aspiration level (zero-level) that is a certain threshold for values. This little modification leads to another definition of subjective (or internal) regret, then an existing model of Risk-sensitive Satisficing (RS), representation of the rational risk attitudes and the qualitatively high performance of bounded (objective) regret. We review the theoretical framework of our natural (internal) RL, compare it with the artificial (external) RL and Simon's paradigm of bounded rationality and satisficing, answer frequently asked questions, and where we go from here. Specifically, the possibilities of modeling societies and economies are discussed.