エキスパートが複数の環境で生成した軌跡から報酬を推定するベイジアン逆強化学習

中田 勇介; 荒井 幸代

doi:10.11517/pjsai.JSAI2019.0_2Q5J201

Abstract

Reinforcement Learning has numerous achievements, but it requires a careful specification of a reward function that represents the objective of a problem. There are problems whose objectives are difficult to represent as a function and are easier to give experts' demonstrations. For these problems, Inverse Reinforcement Learning is useful because it estimates a reward function from expert's demonstrations. Most of existing Inverse Reinforcement Learning methods assume that an expert gives demonstrations in a fixed environment, but an expert can provide demonstrations for a specific objective in multiple environments. For example, it is difficult to represent objective for car driving, and the driver can give demonstrations under multiple situations. In such cases, it is natural that we use demonstrations in multiple environments to estimate the rewards of the expert. We formulated and proposed an algorithm for this problem based on Bayesian Inverse Reinforcement Learning. Experimental results show that the proposed method quantitatively overperforms the existing method.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!