Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
35th (2021)
Session ID : 2I1-GS-5a-03
Conference information

Q-Learning in Prisoner's Dilemma with Noisy Observations
*Mitsuki SAKAMOTOAtsushi IWASAKI
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

This paper examines how Q-learning acquires (non) cooperative behavior in a repeated prisoner's dilemma where players can misperceive the opponent's actions. How people cooperate is a fundamental and interdisciplinary question in artificial intelligence, economics, biology, and so on.Under such misperception, even the well-known tit-for-tat strategy (TFT) is hard to retain cooperation because retaliation occurs.On the other hand, it has been shown that a minor, but important strategy, Win-Stay, Lose-Shift (WSLS) can effectively recover cooperation even after misperception. The main question of this paper is whether a simple Q-learning can learn such a resilient cooperative behavior as WSLS. To this end, we first propose a Q-learning system called Neural Replicator Dynamics with Mutation (NeuRD+M) for games with misperception and then observe that NeuRD+M outperforms two existing Q-learning systems with respect to rewards and cooperation rates and learns the behavior of WSLS.

Content from these authors
© 2021 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top