Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
36th (2022)
Session ID : 1N1-GS-5-01
Conference information

Optimization of subjective utility to derive cooperative actions in a prisoner's dilemma environment
*Ryoichi TAKATSUKAKoichi MORIYAMAAtsuko MUTOHTohgoroh MATSUINobuhiro INUZUKA
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

In a society, there exists a situation called social dilemma where either the individual interest or the public interest must be given priority. It’s known that human doesn’t always prioritize individual interests in such situations. Whereas, reinforcement learning agents maximize individual rewards because their goal is to maximize rewards, which isn’t convenient in a social dilemma. To solve this problem, a method to derive the utility from the reward by evolutionary computation and apply the utility to reinforcement learning was proposed, which leads to cooperative behavior in a two-prisoner’s dilemma game, one of the models of social dilemmas. However, in this method, the form of the utility-deriving function is fixed and only the coefficients are evolved, so it’s not clear what kind of function is suitable. Therefore, in this study, in order to optimize the function itself, we use a method to obtain its weights by evolutionary computation using a three-layer perceptron that can represent arbitrary function, and investigate whether mutual cooperation occurs and the utility-deriving function. Simulation experiments show that, regardless of the number of neurons in the middle layer, the evolved versatile functions will satisfy a specific relation and generate mutual cooperation in a two-prisoner’s dilemma game.

Content from these authors
© 2022 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top