合意制御に基づく協調型トンプソン抽出の検討

神村 素輝; 林 直樹; 高井 重昌

doi:10.5687/iscie.33.57

Abstract

Recently, distributed control for multi-agent systems has attracted much attention. Each agent makes a decision through interaction over a communication network. In general, there exists a trade-off between exploration of the best choice and exploitation of the obtained knowledge. Such a trade-off can be formulated as the bandit problem. In this paper, we investigate a distributed bandit problem where a group of agents cooperatively searches the best choice in a distributed manner. We propose a cooperative Thompson sampling based on the consensus algorithm of multi-agent systems. The theoretical analysis of a regret bound is carried out for the case when the communication network is represented by a complete graph. The numerical examples show that the regret can be reduced by the proposed cooperative Thompson sampling compared to the case when agents individually search the best choice without cooperation.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!