Host: The Japanese Society for Artificial Intelligence
Name : The 35th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 35
Location : [in Japanese]
Date : June 08, 2021 - June 11, 2021
This paper examines how Q-learning acquires (non) cooperative behavior in a repeated prisoner's dilemma where players can misperceive the opponent's actions. How people cooperate is a fundamental and interdisciplinary question in artificial intelligence, economics, biology, and so on.Under such misperception, even the well-known tit-for-tat strategy (TFT) is hard to retain cooperation because retaliation occurs.On the other hand, it has been shown that a minor, but important strategy, Win-Stay, Lose-Shift (WSLS) can effectively recover cooperation even after misperception. The main question of this paper is whether a simple Q-learning can learn such a resilient cooperative behavior as WSLS. To this end, we first propose a Q-learning system called Neural Replicator Dynamics with Mutation (NeuRD+M) for games with misperception and then observe that NeuRD+M outperforms two existing Q-learning systems with respect to rewards and cooperation rates and learns the behavior of WSLS.