Estimation of Different Reward Functions Latent in Trajectory Data

Masaharu Saito; Sachiyo Arai

doi:10.20965/jaciii.2024.p0403

抄録

In recent years, inverse reinforcement learning has attracted attention as a method for estimating the intention of actions using the trajectories of various action-taking agents, including human flow data. In the context of reinforcement learning, “intention” refers to a reward function. Conventional inverse reinforcement learning assumes that all trajectories are generated from policies learned under a single reward function. However, it is natural to assume that people in a human flow act according to multiple policies. In this study, we introduce an expectation-maximization algorithm to inverse reinforcement learning, and propose a method to estimate different reward functions from the trajectories of human flow. The effectiveness of the proposed method was evaluated through a computer experiment based on human flow data collected from subjects around airport gates.

著者関連情報

この記事は最新の被引用情報を取得できません。

This article is licensed under a Creative Commons [Attribution-NoDerivatives 4.0 International] license (https://creativecommons.org/licenses/by-nd/4.0/).
The journal is fully Open Access under Creative Commons licenses and all articles are free to access at JACIII official website.
https://www.fujipress.jp/jaciii/jc-about/#https://creativecommons.org/licenses/by-nd

お気に入り & アラート

閲覧履歴

創刊号からの全論文のPDFは
JACIII公式サイトで公開中(無料)
doiリンクをクリック！

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）