Reward Design for Multi-Agent Reinforcement Learning with a Penalty Based on the Payment Mechanism

Natsuki Matsunami; Shun Okuhara; Takayuki Ito

doi:10.1527/tjsai.36-5_AG21-H

Abstract

In this paper, we propose a novel method of reward design for multi-agent reinforcement learning (MARL). One of the main uses of MARL is building cooperative policies between self-interested agents. We take inspiration from the concept of mechanism design from game theory to modify how agents are rewarded in MARL algorithms. We defined the payment that reflects the negative contribution to other agents’ valuation in the same manner as the Vickrey-Clarke-Groves (VCG) mechanism. We give the individual learning agent a reward signal that consists of two elements. One is a reward evaluated solely on the basis of individual behavior that will follow a greedy and selfish policy, and the other is a negative reward as a penalty evaluated on the basis of the payment that will reflect the negative contribution to social welfare. We call this scheme reward design for MARL based on the payment mechanism (RDPM). We experimented with RDPM in two different scenarios. We show that RDPM can increase the social utility among agents while the other reward designs achieve far less, even for basic and simplistic problems. We finally analyze and discuss how RDPM affects the building of a cooperative policy.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!