複数の報酬関数を推定可能なタスク条件付き敵対的模倣学習

小林 京一郎; 堀井 隆斗; 岩城 諒; 長井 志江; 浅田 稔

doi:10.11517/pjsai.JSAI2019.0_4I3J202

33rd (2019)

Session ID : 4I3-J-2-02

DOI https://doi.org/10.11517/pjsai.JSAI2019.0_4I3J202

Conference information

Host: The Japanese Society for Artificial Intelligence

Name : The 33rd Annual Conference of the Japanese Society for Artificial Intelligence, 2019

Number : 33

Location : [in Japanese]

Date : June 04, 2019 - June 07, 2019

Task-Conditional Generative Adversarial Imitation Learning That Infers Multiple Reward Functions

*Kyoichiro KOBAYASHI, Takato HORII, Ryo IWAKI, Yukie NAGAI, Minoru ASADA

Author information

Keywords: Imitation Learning, Reinforcement Learning, Inverse Reinforcement Learning, Deep Learning

CONFERENCE PROCEEDINGS FREE ACCESS

Details

Abstract

In this work, we propose a new framework of imitation learning that is designed to infer the multiple reward func- tions. We introduce latent variables to discriminator and generator in Generative Adversarial Imitation Learning (GAIL) to learn different reward functions and policies for different tasks. In order to control the balance between imitate expert directly (early convergence) and to enhance variance of policy (sample various data and learning robust reward), we introduce entropy regularized correction term in generator's objective function. We guarantee that the objective function has the unique optimal solution by the same discussion as GAIL. In the experiment at the grid world problem, we show that our framework can infer multiple reward functions and policies that represent different tasks efficiently.

Corresponding author

Conference information

Register with J-STAGE for free!