The Proceedings of JSME annual Conference on Robotics and Mechatronics (Robomec)
Online ISSN : 2424-3124
2019
Session ID : 1P2-A11
Conference information

Imitation Learning of Deformable Object Manipulation with Entropy-maximizing Dynamic Policy Programming
*Yoshihisa TsurumineYunduan CuiKimitoshi YamazakiTakamitsu Matsubara
Author information
CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Details
Abstract

Although value function-based Reinforcement Learning (RL) has been successfully applied for a variety of tasks as well as with policy search, manually designing appropriate reward functions for such complex tasks as robotic cloth manipulation still remains challenging and costly. Inspired by the recent success on Generative Adversarial Imitation Learning (GAIL) in policy search, which allows an agent to learn near-optimal behaviors from expert demonstrations without explicit reward function design, we explore an imitation learning framework for value function-based RL approach. The generator of GAIL needs to have both the smoothness of policy update and the diversity of the learned policy. We first propose a novel value function-based RL method, Entropy-maximizing DPP (EDPP). Then the corresponding imitation learning framework, P-GAIL is developed. In order to investigate the performance, we applied P-GAIL to the flipping a handkerchief task.

Content from these authors
© 2019 The Japan Society of Mechanical Engineers
Previous article Next article
feedback
Top