主催: 一般社団法人 日本機械学会
会議名: ロボティクス・メカトロニクス 講演会2017
開催日: 2017/05/10 - 2017/05/13
This paper presents a generation method of trajectories for score-based inverse reinforcement learning. While most inverse reinforcement learning methods require demonstrations of the expert, score-based inverse reinforcement learning can estimate the expert’s reward function from scores of arbitrary trajectories. This study brings active learning into score-based inverse reinforcement learning to estimate a reward function from few trajectories. An agent generates informative trajectories for reward estimation and pose them as queries to the expert. The proposed method generates the trajectories using expectation of discounted accrued features which is calculated by dynamic programming. The informativeness of the trajectories is evaluated by criteria for queries. Simulation results in a cart-pole domain demonstrate that the proposed method efficiently estimates the reward function from few trajectories.