ロボティクス・メカトロニクス講演会講演概要集
Online ISSN : 2424-3124
セッションID: 2P2-E05
会議情報

スコアに基づく逆強化学習のための動的計画法による軌道の自己生成
渡邉 夏美増山 岳人梅田 和昇
著者情報
会議録・要旨集 フリー

詳細
抄録

This paper presents a generation method of trajectories for score-based inverse reinforcement learning. While most inverse reinforcement learning methods require demonstrations of the expert, score-based inverse reinforcement learning can estimate the expert’s reward function from scores of arbitrary trajectories. This study brings active learning into score-based inverse reinforcement learning to estimate a reward function from few trajectories. An agent generates informative trajectories for reward estimation and pose them as queries to the expert. The proposed method generates the trajectories using expectation of discounted accrued features which is calculated by dynamic programming. The informativeness of the trajectories is evaluated by criteria for queries. Simulation results in a cart-pole domain demonstrate that the proposed method efficiently estimates the reward function from few trajectories.

著者関連情報
© 2017 一般社団法人 日本機械学会
前の記事 次の記事
feedback
Top