主催: 一般社団法人 日本機械学会
会議名: ロボティクス・メカトロニクス 講演会2022
開催日: 2022/06/01 - 2022/06/04
The complexity of Multi-Agent Reinforcement Learning (MARL) problems increases exponentially with the number of agents. Poor scalability to the number results in limited applications of MARL to large-scale multi-agent systems.
In this paper, we present a novel MARL algorithm leveraging a self-policy network to estimate the intentions of other agents.The intention of other agents is backpropagated from a self-policy network with the observed action of others. Estimated intentions are then used as input to the self-policy network. As long as the agents are cooperative, our method does not require any additional model to learn others’ intentions. We also introduce a simple curriculum learning, which gradually increases the number of agents. Simulation results indicated that the proposed method improves the performance of learned policy even if the number of agents increases.