ガンマダイバージェンスに基づく準最適な軌跡のための逆強化学習

岸川 大航; 荒井 幸代

doi:10.11517/pjsai.JSAI2023.0_3D1GS201

Abstract

Inverse Reinforcement Learning (IRL) is a method for estimating underlying rewards from expert trajectories. IRL is used to imitate the expert through reinforcement learning in tasks where reward design is difficult or to analyze human or biological intentions. Traditional IRL methods assume that expert trajectories are perfectly optimal. Thus, sub-optimal trajectories lead to the estimation of a sub-optimal reward. There are several IRL methods for sub-optimal trajectories, although the dominant approach uses an optimality ranking of each trajectory. However, these methods are strongly affected by the accuracy of the ranking data. Therefore, we consider the suboptimal trajectory distribution to be a mixture of the optimal trajectory distribution with outliers. Then, we propose an IRL method using gamma divergence, which has the property of ignoring outliers. The proposed method can be applied to classification-based IRL methods and can be regarded as a generalization of the previously used cross-entropy-based methods. We evaluate the proposed method through computer experiments.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!