一般和確率ゲームにおける定常Stackelberg均衡のための方策反復法 Stackelberg均衡の観点でのパレート最適方策の提案と方策改善による反復的解法の収束保証

工藤 ミコト; 秋本 洋平

doi:10.11517/pjsai.JSAI2024.0_4D3GS202

38th (2024)

Session ID : 4D3-GS-2-02

DOI https://doi.org/10.11517/pjsai.JSAI2024.0_4D3GS202

Conference information

Host: The Japanese Society for Artificial Intelligence

Name : The 38th Annual Conference of the Japanese Society for Artificial Intelligence

Number : 38

Location : [in Japanese]

Date : May 28, 2024 - May 31, 2024

Policy Iteration for Stationary Stackelberg Equilibria in General-sum Stochastic Games

Proposal of Pareto-optimal Policies in terms of Staclelberg Equilibria and Probable Convergence Guarantee of the Iterative Method by Policy Improvements

*Mikoto KUDO, Yohei AKIMOTO

Author information

Keywords: Stochastic game, Stackelberg Equilibrium, Multi-agent MDP, Multi-agent RL, Policy guidance

CONFERENCE PROCEEDINGS FREE ACCESS

Details

Abstract

A stochastic game is a game model where agents simultaneous maximize their cumulative rewards. A Stackelberg equilibrium is defined as a pair of policies that maximize the leader agent's return when the follower agent's policy is always the best response against the leader's one. Stationary Stackelberg equilibria (SSE) are not always exist, and existing methods require strong assumptions to guarantee the convergence and the coincidence of the limit with the SSE. We propose an alternative solution concept, Pareto-optimal (PO) policies, and an algorithm for PO policies based on the policy iteration. Our method monotonically approaches the Pareto front by iterative local policy improvements.

Corresponding author

Conference information

Register with J-STAGE for free!