主催: 一般社団法人 日本機械学会
会議名: ロボティクス・メカトロニクス 講演会2019
開催日: 2019/06/05 - 2019/06/08
Although value function-based Reinforcement Learning (RL) has been successfully applied for a variety of tasks as well as with policy search, manually designing appropriate reward functions for such complex tasks as robotic cloth manipulation still remains challenging and costly. Inspired by the recent success on Generative Adversarial Imitation Learning (GAIL) in policy search, which allows an agent to learn near-optimal behaviors from expert demonstrations without explicit reward function design, we explore an imitation learning framework for value function-based RL approach. The generator of GAIL needs to have both the smoothness of policy update and the diversity of the learned policy. We first propose a novel value function-based RL method, Entropy-maximizing DPP (EDPP). Then the corresponding imitation learning framework, P-GAIL is developed. In order to investigate the performance, we applied P-GAIL to the flipping a handkerchief task.