Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
37th (2023)
Session ID : 3D1-GS-2-01
Conference information

Gamma Divergence-Based Inverse Reinforcement Learning for Sub-Optimal Trajectories
*Daiko KISHIKAWASachiyo ARAI
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

Inverse Reinforcement Learning (IRL) is a method for estimating underlying rewards from expert trajectories. IRL is used to imitate the expert through reinforcement learning in tasks where reward design is difficult or to analyze human or biological intentions. Traditional IRL methods assume that expert trajectories are perfectly optimal. Thus, sub-optimal trajectories lead to the estimation of a sub-optimal reward. There are several IRL methods for sub-optimal trajectories, although the dominant approach uses an optimality ranking of each trajectory. However, these methods are strongly affected by the accuracy of the ranking data. Therefore, we consider the suboptimal trajectory distribution to be a mixture of the optimal trajectory distribution with outliers. Then, we propose an IRL method using gamma divergence, which has the property of ignoring outliers. The proposed method can be applied to classification-based IRL methods and can be regarded as a generalization of the previously used cross-entropy-based methods. We evaluate the proposed method through computer experiments.

Content from these authors
© 2023 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top