2018 年 138 巻 6 号 p. 720-727
Inverse Reinforcement Learning (IRL) is a promising framework for estimating a reward function under given behaviors of the expert. However, the IRL problem is ill-posed in that several reward functions that can reproduce expert's behavior will be available. The previous studies of IRL have just focused on the reproduction rate of original behavior of expert's to select the most appropriate reward function. This evaluation measure seems not enough to shape the candidate of reward functions. To select the most appropriate one from the alternative reward functions, we introduce another objective function into the existing IRL algorithms of Ng et al. Specifically, we focus on the learning efficiency as an additional objective function to make the faster convergence of RL via introducing Genetic Algorithm. Consequently, our proposed IRL algorithm guarantees to output the reward function by which agent acquires both effective and optimal policy. We show the effectiveness of our approach by comparing the performance of the proposed method to those of the previous algorithms.
J-STAGEがリニューアルされました! https://www.jstage.jst.go.jp/browse/-char/ja/