教師のオフラインデータに基づくインタラクティブ模倣学習

中口 悠輝

doi:10.11517/pjsai.JSAI2024.0_1B4GS204

Abstract

Imitation learning solves reinforcement learning problems with reference to some teacher information. While the typical method of behavior cloning could not be applied to long-term tasks because covariate shifts accumulate over time, interactive imitation learning solves this problem by obtaining online feedback from a teacher model. Furthermore, even when the teacher is non-optimal, such as when the task is not exactly the same for teacher and student, if one can use the student's reward information, it is possible to learn faster than reinforcement learning and even surpass the teacher. However, interactive imitation learning requires a teacher who can respond online, which limits applicable teachers. In particular, efficient interactive imitation learning requires a teacher's value function, and applicable teachers are limited to reinforcement-learned models. In this study, we propose a method to extend efficient interactive imitation learning that requires a value function to be applied to teachers with only offline trajectory data.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!