部分観測マルチエージェント環境におけるマルコフ性と報酬予測性を保証する自己および他者モデリング

山下 佳威; 鈴木 雅大; 松尾 豊

doi:10.11517/pjsai.JSAI2025.0_1B3OS41a05

Abstract

Recent advances in reinforcement learning for multi-agent environments have underscored the importance of Opponent-Modeling, where agents infer internal states or strategies of opponents. Recent studies have explored AutoEncoder-based latent representations that limit access to opponent information during execution for Opponent-Modeling in partially observable environments. In reinforcement learning, the state input to the policy and value function in a Markov decision process (MDP) must satisfy the Markov property and serve as a sufficient statistic for future reward prediction. However, under partial observability, many opponent modeling approaches focus solely on reconstructing opponent information in the latent representation, without ensuring that it retains Markovian or reward-predictive properties. To overcome this limitation, we propose a representation learning method that models not only the opponent but also the agent itself. We validated our method through experiments, demonstrating its effectiveness in improving reinforcement learning performance.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!