Transactions of the Japanese Society for Artificial Intelligence
Online ISSN : 1346-8030
Print ISSN : 1346-0714
ISSN-L : 1346-0714
Original Paper
Reward Design for Multi-Agent Reinforcement Learning with a Penalty Based on the Payment Mechanism
Natsuki MatsunamiShun OkuharaTakayuki Ito
Author information
JOURNAL FREE ACCESS

2021 Volume 36 Issue 5 Pages AG21-H_1-11

Details
Abstract

In this paper, we propose a novel method of reward design for multi-agent reinforcement learning (MARL). One of the main uses of MARL is building cooperative policies between self-interested agents. We take inspiration from the concept of mechanism design from game theory to modify how agents are rewarded in MARL algorithms. We defined the payment that reflects the negative contribution to other agents’ valuation in the same manner as the Vickrey-Clarke-Groves (VCG) mechanism. We give the individual learning agent a reward signal that consists of two elements. One is a reward evaluated solely on the basis of individual behavior that will follow a greedy and selfish policy, and the other is a negative reward as a penalty evaluated on the basis of the payment that will reflect the negative contribution to social welfare. We call this scheme reward design for MARL based on the payment mechanism (RDPM). We experimented with RDPM in two different scenarios. We show that RDPM can increase the social utility among agents while the other reward designs achieve far less, even for basic and simplistic problems. We finally analyze and discuss how RDPM affects the building of a cooperative policy.

Content from these authors
© The Japanese Society for Artificial Intelligence 2021
Previous article Next article
feedback
Top