強化学習を戦略とする繰り返し囚人のジレンマのナッシュ均衡の数値的分析

鳥居 拓馬; 日高 昇平

doi:10.11517/pjsai.JSAI2020.0_1P5GS703

Abstract

Iterated Prisoner's Dilemma (IPD) has been a standard tool for social dilemma. As the classic game-theoretic analyses of IPD have ended up mutual defection, another class of IPDs with reinforcement learners have been explored. However, the basic nature of such class of games themselves have not been well understood yet. In the present paper, we analyzed the Nash equilibria of IPD between reinforcement learners. In the standard IPD, it has been known that the only Nash equilibrium as a result of the rationale choices is the worst result for both players. However, unlike both previous lines of research, our analysis showed that in IPD with reinforcement learners the individually rational choices will correspond with the mutually beneficial result for both players. This result suggests that the social dilemma has been dissolved between this type of learning agents.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!