ロボティクス・メカトロニクス講演会講演概要集
Online ISSN : 2424-3124
セッションID: 1P2-A11
会議情報

最大エントロピー動的方策計画による柔軟物操作の模倣学習
*鶴峯 義久崔 允端山崎 公俊松原 崇充
著者情報
会議録・要旨集 認証あり

詳細
抄録

Although value function-based Reinforcement Learning (RL) has been successfully applied for a variety of tasks as well as with policy search, manually designing appropriate reward functions for such complex tasks as robotic cloth manipulation still remains challenging and costly. Inspired by the recent success on Generative Adversarial Imitation Learning (GAIL) in policy search, which allows an agent to learn near-optimal behaviors from expert demonstrations without explicit reward function design, we explore an imitation learning framework for value function-based RL approach. The generator of GAIL needs to have both the smoothness of policy update and the diversity of the learned policy. We first propose a novel value function-based RL method, Entropy-maximizing DPP (EDPP). Then the corresponding imitation learning framework, P-GAIL is developed. In order to investigate the performance, we applied P-GAIL to the flipping a handkerchief task.

著者関連情報
© 2019 一般社団法人 日本機械学会
前の記事 次の記事
feedback
Top