複数のダイナミクス下でのデモンストレーションによる転移可能な逆強化学習

中口 悠輝

doi:10.11517/pjsai.JSAI2020.0_2J6GS201

Abstract

Recently, reinforcement learning (RL) has been showing increasingly high performance in a variety of complex tasks of decision making and control, but RL requires quite careful engineering of reward functions to solve real tasks. Inverse reinforcement learning (IRL) is a framework to construct reward functions by learning from demonstration, but the estimated reward function cannot be transferred to other dynamics due to its dynamics-dependent indefiniteness. To obtain transferable reward functions, we propose a novel mathematical formulation for fixing the dynamics-dependent indefiniteness of reward functions by utilizing demonstrations generated in multiple dynamics. We also show that the existing discussion on the indefiniteness of reward functions can be generalized from usual RL to maximum entropy RL, which serves as the subroutine forward solver in usual IRL algorithms based on maximum entropy IRL.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!