見間違えのある繰り返し囚人のジレンマにおけるQ学習に関する研究

坂本 充生; 岩崎 敦

doi:10.11517/pjsai.JSAI2021.0_2I1GS5a03

Abstract

This paper examines how Q-learning acquires (non) cooperative behavior in a repeated prisoner's dilemma where players can misperceive the opponent's actions. How people cooperate is a fundamental and interdisciplinary question in artificial intelligence, economics, biology, and so on.Under such misperception, even the well-known tit-for-tat strategy (TFT) is hard to retain cooperation because retaliation occurs.On the other hand, it has been shown that a minor, but important strategy, Win-Stay, Lose-Shift (WSLS) can effectively recover cooperation even after misperception. The main question of this paper is whether a simple Q-learning can learn such a resilient cooperative behavior as WSLS. To this end, we first propose a Q-learning system called Neural Replicator Dynamics with Mutation (NeuRD+M) for games with misperception and then observe that NeuRD+M outperforms two existing Q-learning systems with respect to rewards and cooperation rates and learns the behavior of WSLS.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!