Learning in Two-Player Matrix Games by Policy Gradient Lagging Anchor

Shiyao DING; Toshimitsu USHIO

doi:10.1587/transfun.E102.A.708

抄録

It is known that policy gradient algorithm can not guarantee the convergence to a Nash equilibrium in mixed policies when it is applied in matrix games. To overcome this problem, we propose a novel multi-agent reinforcement learning (MARL) algorithm called a policy gradient lagging anchor (PGLA) algorithm. And we prove that the agents' policies can converge to a Nash equilibrium in mixed policies by using the PGLA algorithm in two-player two-action matrix games. By simulation, we confirm the convergence and also show that the PGLA algorithm has a better convergence than the L_R-I lagging anchor algorithm.

著者関連情報

お気に入り & アラート

お気に入りに追加
追加情報アラート
被引用アラート
認証解除アラート

閲覧履歴

京都府宇治市における明治中期以降茶畑の変遷について
[title in Japanese]
The Analysis of the Sweat Response to Tetramethyl-ammonium in Human Skin
卒業論文題目
In Situ Lymph Dynamic Characterization through Lymph Nodes in Rabbit Hind Leg: Special Reference to Nodal Inflammation

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）