Inverse Reinforcement Learning with Agents’ Biased Exploration Based on Sub-Optimal Sequential Action Data

Fumito Uwano; Satoshi Hasegawa; Keiki Takadama

doi:10.20965/jaciii.2024.p0380

Special Issue on Cutting Edge of Reinforcement Learning and its Hybrid Methods

Inverse Reinforcement Learning with Agents’ Biased Exploration Based on Sub-Optimal Sequential Action Data

Fumito Uwano , Satoshi Hasegawa, Keiki Takadama

著者情報

キーワード: inverse reinforcement learning, data generation, reward design, sub-optimal data

ジャーナルオープンアクセス

2024 年 28 巻 2 号 p. 380-392

DOI https://doi.org/10.20965/jaciii.2024.p0380

詳細

抄録

Inverse reinforcement learning (IRL) estimates a reward function for an agent to behave along with expert data, e.g., as human operation data. However, expert data usually have redundant parts, which decrease the agent’s performance. This study extends the IRL to sub-optimal action data, including lack and detour. The proposed method searches for new actions to determine optimal expert action data. This study adopted maze problems with sub-optimal expert action data to investigate the performance of the proposed method. The experimental results show that the proposed method finds optimal expert data better than the conventional method, and the proposed search mechanisms perform better than random search.

責任著者(Corresponding author)

ファンド情報

1.助成機関/事業名: Japan Society for the Promotion of Science

2.助成機関/事業名: Azbil Yamatake General Foundation

J-STAGEへの登録はこちら（無料）