失敗確率伝播アルゴリズムEFPAの提案とマルチエージェント環境下での有効性の検証

村岡 宏紀; 宮崎 和光; 小林 博明

doi:10.1541/ieejeiss.136.273

Abstract

It is known that Improved Penalty Avoiding Rational Policy Making algorithm (IPARP) can learn policies by a reward and a penalty. IPARP aims to identify penalty rules that have a high possibility to receive a penalty. Though IPARP is effective in many cases, it needs many trial-and-error searches due to memory constraints. In this paper, we propose a method called Expected Failure Probability Algorithm (EFPA) to speed it up. In addition, we extend EFPA to multi-agent environments. In multi-agent learning, it is important to avoid concurrent learning problem that occurs when multiple agents learn simultaneously. We also propose a method to avoid the problem and confirm the effectiveness by numerical experiments.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!