Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
39th (2025)
Session ID : 1B3-OS-41a-05
Conference information

Self and Opponent Modeling for Ensuring Markovian and Reward-Predictive Representations in Partially Observable Multi-Agent Environments
*Kai YAMASHITAMasahiro SUZUKIYutaka MATSUO
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

Recent advances in reinforcement learning for multi-agent environments have underscored the importance of Opponent-Modeling, where agents infer internal states or strategies of opponents. Recent studies have explored AutoEncoder-based latent representations that limit access to opponent information during execution for Opponent-Modeling in partially observable environments. In reinforcement learning, the state input to the policy and value function in a Markov decision process (MDP) must satisfy the Markov property and serve as a sufficient statistic for future reward prediction. However, under partial observability, many opponent modeling approaches focus solely on reconstructing opponent information in the latent representation, without ensuring that it retains Markovian or reward-predictive properties. To overcome this limitation, we propose a representation learning method that models not only the opponent but also the agent itself. We validated our method through experiments, demonstrating its effectiveness in improving reinforcement learning performance.

Content from these authors
© 2025 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top