In the present paper, we propose a method that can acquire cooperative action to reach an appropriate goal without controlling reward by designers. To accomplish this, we introduce a new concept of reward interpretation. It is an idea that an agent can increase or decrease reward given by an environment through the reward interpretation by itself. We applies this idea to a Q-learning method. The simulation results show that the proposed method is superior to a standard Q-learning method and a Q-learning method with cooperation in terms of the number of successful cooperation.
J-STAGEがリニューアルされました! https://www.jstage.jst.go.jp/browse/-char/ja/