強化学習に基づく行動モデリングにおける選択の偏りを考慮した行動選択確率モデルに関する検討

村山 友貴; 堀尾 恵一; 久保田 良輔

doi:10.14864/fss.40.0_192

Abstract

In this study, we propose an action selection probability model that takes into account the case where there is a certain bias in the action to be selected in reinforcement learning-based action modeling. In the proposed action selection probability model, the softmax function is shifted in parallel when calculating the action selection probability, assuming that factors other than reward influence the selection of actions. Specifically, parallel shift is achieved by adding a certain bias to the difference of action values in each state for calculating the action selection probability. In the proposed method, this bias value is determined based on maximum likelihood estimation in addition to the learning rate and inverse temperature in conventional reinforcement learning models, respectively. In order to confirm the effectiveness of the proposed method, we artificially generated data that is likely to take a certain action independent of the reward using a two-armed bandit problem, which is a type of benchmarking, and compared the likelihood of each model in the conventional and proposed methods using this data. The results showed that the likelihood of the proposed method was significantly higher than that of the conventional method.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!