スコアに基づく逆強化学習のための動的計画法による軌道の自己生成

渡邉 夏美; 増山 岳人; 梅田 和昇

doi:10.1299/jsmermd.2017.2P2-E05

Abstract

This paper presents a generation method of trajectories for score-based inverse reinforcement learning. While most inverse reinforcement learning methods require demonstrations of the expert, score-based inverse reinforcement learning can estimate the expert’s reward function from scores of arbitrary trajectories. This study brings active learning into score-based inverse reinforcement learning to estimate a reward function from few trajectories. An agent generates informative trajectories for reward estimation and pose them as queries to the expert. The proposed method generates the trajectories using expectation of discounted accrued features which is calculated by dynamic programming. The informativeness of the trajectories is evaluated by criteria for queries. Simulation results in a cart-pole domain demonstrate that the proposed method efficiently estimates the reward function from few trajectories.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!