最大エントロピー動的方策計画による柔軟物操作の模倣学習

鶴峯 義久; 崔 允端; 山崎 公俊; 松原 崇充

doi:10.1299/jsmermd.2019.1P2-A11

Abstract

Although value function-based Reinforcement Learning (RL) has been successfully applied for a variety of tasks as well as with policy search, manually designing appropriate reward functions for such complex tasks as robotic cloth manipulation still remains challenging and costly. Inspired by the recent success on Generative Adversarial Imitation Learning (GAIL) in policy search, which allows an agent to learn near-optimal behaviors from expert demonstrations without explicit reward function design, we explore an imitation learning framework for value function-based RL approach. The generator of GAIL needs to have both the smoothness of policy update and the diversity of the learned policy. We first propose a novel value function-based RL method, Entropy-maximizing DPP (EDPP). Then the corresponding imitation learning framework, P-GAIL is developed. In order to investigate the performance, we applied P-GAIL to the flipping a handkerchief task.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!