大局基準値共有による社会的強化学習

其田 憲明; 神谷 匠; 高橋 達二

doi:10.11517/pjsai.JSAI2019.0_3K3J204

Abstract

When humans learn, it is not just by individual trial-and-error, but the learning is accelerated by sharing information with others. There are social learning strategies such as imitating others’ actions and emulating the high achievement of someone. As a model of social learning, sharing of state- and/or action-values are often implemented in reinforcement learning algorithms. However, sharing information of such huge amount is not realistic for a model of social learning of humans or animals. We propose an algorithm in which a mere “record” (achieved accumulated reward per episode) leads to efficient social learning. The algorithm is based on the model of satisficing integrated with different risk attitudes around the reference (aspiration level), and the conversion of the global aspiration onto each state.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!