Host: The Japan Society of Mechanical Engineers
Name : [in Japanese]
Date : June 01, 2022 - June 04, 2022
The complexity of Multi-Agent Reinforcement Learning (MARL) problems increases exponentially with the number of agents. Poor scalability to the number results in limited applications of MARL to large-scale multi-agent systems.
In this paper, we present a novel MARL algorithm leveraging a self-policy network to estimate the intentions of other agents.The intention of other agents is backpropagated from a self-policy network with the observed action of others. Estimated intentions are then used as input to the self-policy network. As long as the agents are cooperative, our method does not require any additional model to learn others’ intentions. We also introduce a simple curriculum learning, which gradually increases the number of agents. Simulation results indicated that the proposed method improves the performance of learned policy even if the number of agents increases.