The Proceedings of JSME annual Conference on Robotics and Mechatronics (Robomec)
Online ISSN : 2424-3124
2017
Session ID : 2P2-E05
Conference information

Self-generation of trajectories via dynamic programming for score-based inverse reinforcement learning
Natsumi WATANABEGakuto MASUYAMAKazunori UMEDA
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

This paper presents a generation method of trajectories for score-based inverse reinforcement learning. While most inverse reinforcement learning methods require demonstrations of the expert, score-based inverse reinforcement learning can estimate the expert’s reward function from scores of arbitrary trajectories. This study brings active learning into score-based inverse reinforcement learning to estimate a reward function from few trajectories. An agent generates informative trajectories for reward estimation and pose them as queries to the expert. The proposed method generates the trajectories using expectation of discounted accrued features which is calculated by dynamic programming. The informativeness of the trajectories is evaluated by criteria for queries. Simulation results in a cart-pole domain demonstrate that the proposed method efficiently estimates the reward function from few trajectories.

Content from these authors
© 2017 The Japan Society of Mechanical Engineers
Previous article Next article
feedback
Top