最大エントロピー動的方策計画による柔軟物操作の模倣学習

鶴峯 義久; 崔 允端; 山崎 公俊; 松原 崇充

doi:10.1299/jsmermd.2019.1P2-A11

抄録

Although value function-based Reinforcement Learning (RL) has been successfully applied for a variety of tasks as well as with policy search, manually designing appropriate reward functions for such complex tasks as robotic cloth manipulation still remains challenging and costly. Inspired by the recent success on Generative Adversarial Imitation Learning (GAIL) in policy search, which allows an agent to learn near-optimal behaviors from expert demonstrations without explicit reward function design, we explore an imitation learning framework for value function-based RL approach. The generator of GAIL needs to have both the smoothness of policy update and the diversity of the learned policy. We first propose a novel value function-based RL method, Entropy-maximizing DPP (EDPP). Then the corresponding imitation learning framework, P-GAIL is developed. In order to investigate the performance, we applied P-GAIL to the flipping a handkerchief task.

著者関連情報

お気に入り & アラート

閲覧履歴

発行機関からのお知らせ

会員向け購読者番号とパスワードは以下URLよりご確認下さい。
https://www.jsme.or.jp/publication/proceedings/

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）