Host: The Japanese Society for Artificial Intelligence
Name : The 38th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 38
Location : [in Japanese]
Date : May 28, 2024 - May 31, 2024
A stochastic game is a game model where agents simultaneous maximize their cumulative rewards. A Stackelberg equilibrium is defined as a pair of policies that maximize the leader agent's return when the follower agent's policy is always the best response against the leader's one. Stationary Stackelberg equilibria (SSE) are not always exist, and existing methods require strong assumptions to guarantee the convergence and the coincidence of the limit with the SSE. We propose an alternative solution concept, Pareto-optimal (PO) policies, and an algorithm for PO policies based on the policy iteration. Our method monotonically approaches the Pareto front by iterative local policy improvements.