ヒトの強化学習の正負非対称学習率と報酬の疎密の関係性

宝田 悠; 太田 宏之; 樋口 滉規; 高橋 達二

doi:10.11517/pjsai.JSAI2024.0_4N1GS104

Abstract

Humans and animals learn from both successes and failures. When you perform an action and get a reward, the value of the action increases and you will choose it frequently after that. In contrast, if you do not get a reward, the value decreases and you will choose it less frequently. This is known as reinforcement learning. A coefficient that determines how much an action’s value increases is called positive learning rate, and one for decreasing is called negative learning rate. For almost all reinforcement learning models used in the field of AI, positive and negative learning rates are set as identical and constant. However, recent studies have discovered that some animals learn asymmetrically, i.e., have different positive and negative learning rates, and that the learning rates adaptively change according to the reward distributions. Then, do humans, too, learn asymmetrically and adaptively? We conducted an online bandit experiment and examined it. Additionally, we conducted an additional decision-making experiment to analyze the results in terms of the relationship between experienced and described decision-making environments.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!