Host: The Japanese Society for Artificial Intelligence
Name : The 37th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 37
Location : [in Japanese]
Date : June 06, 2023 - June 09, 2023
Inverse Reinforcement Learning (IRL) is a method for estimating underlying rewards from expert trajectories. IRL is used to imitate the expert through reinforcement learning in tasks where reward design is difficult or to analyze human or biological intentions. Traditional IRL methods assume that expert trajectories are perfectly optimal. Thus, sub-optimal trajectories lead to the estimation of a sub-optimal reward. There are several IRL methods for sub-optimal trajectories, although the dominant approach uses an optimality ranking of each trajectory. However, these methods are strongly affected by the accuracy of the ranking data. Therefore, we consider the suboptimal trajectory distribution to be a mixture of the optimal trajectory distribution with outliers. Then, we propose an IRL method using gamma divergence, which has the property of ignoring outliers. The proposed method can be applied to classification-based IRL methods and can be regarded as a generalization of the previously used cross-entropy-based methods. We evaluate the proposed method through computer experiments.