逆強化学習における学習効率を最大化する報酬関数の推定

北里 勇樹; 荒井 幸代

doi:10.1541/ieejeiss.138.720

抄録

Inverse Reinforcement Learning (IRL) is a promising framework for estimating a reward function under given behaviors of the expert. However, the IRL problem is ill-posed in that several reward functions that can reproduce expert's behavior will be available. The previous studies of IRL have just focused on the reproduction rate of original behavior of expert's to select the most appropriate reward function. This evaluation measure seems not enough to shape the candidate of reward functions. To select the most appropriate one from the alternative reward functions, we introduce another objective function into the existing IRL algorithms of Ng et al. Specifically, we focus on the learning efficiency as an additional objective function to make the faster convergence of RL via introducing Genetic Algorithm. Consequently, our proposed IRL algorithm guarantees to output the reward function by which agent acquires both effective and optimal policy. We show the effectiveness of our approach by comparing the performance of the proposed method to those of the previous algorithms.

著者関連情報

お気に入り & アラート

閲覧履歴

発行機関からのお知らせ

【電気学会会員の方】購読している論文誌を無料でご覧いただけます（会員ご本人のみの個人としての利用に限ります）。購読者番号欄にMyページへのログインIDを，パスワード欄に生年月日8ケタ（西暦，半角数字。例：19800303）を入力して下さい。

ダウンロード

論文(PDF)の閲覧方法はこちら
閲覧方法 (389.7K)

前身誌

電気学会論文誌. C

電氣學會雜誌

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）