Host: The Japanese Society for Artificial Intelligence
Name : The 38th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 38
Location : [in Japanese]
Date : May 28, 2024 - May 31, 2024
Imitation learning solves reinforcement learning problems with reference to some teacher information. While the typical method of behavior cloning could not be applied to long-term tasks because covariate shifts accumulate over time, interactive imitation learning solves this problem by obtaining online feedback from a teacher model. Furthermore, even when the teacher is non-optimal, such as when the task is not exactly the same for teacher and student, if one can use the student's reward information, it is possible to learn faster than reinforcement learning and even surpass the teacher. However, interactive imitation learning requires a teacher who can respond online, which limits applicable teachers. In particular, efficient interactive imitation learning requires a teacher's value function, and applicable teachers are limited to reinforcement-learned models. In this study, we propose a method to extend efficient interactive imitation learning that requires a value function to be applied to teachers with only offline trajectory data.