2021 Volume 36 Issue 5 Pages AG21-B_1-9
Multi-agent inverse reinforcement learning (MAIRL) is a framework for inferring expert agents’ reward functions from observed trajectories in a Markov game. MAIRL consists of two steps: the calculation of the optimal policy for reward and the update of reward based on the difference between the calculated policy and the expert trajectory. The former becomes a bottleneck because it is a multi-agent reinforcement learning (MARL) problem, which causes the non-stationary problem. Avoiding this problem, we propose the parallel coordinate descent method based MAIRL, which is an extension of maximum discounted causal entropy inverse reinforcement learning to theMarkov game. A previous method that uses coordinate descent updates one agent’s reward and policy at a time when other agents’ policies are fixed. On the other hand, the proposed method updates reward and policy for each agent in parallel and exchanges other agent policies synchronously for improving learning speed. In computer experiments, we compare the learning speeds of the previous and proposed method in the case of inferring the reward of a one equilibrium solution in two agents grid navigation. Experimental results showed that the parallelization does not always improve convergence speed, that the other agent’s policies significantly affect the learning speed, and that the learning speed is improved by parallelization when the other agent’s policies are the pseudo policy that is overwritten by the expert trajectories distribution.