IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
Online ISSN : 1745-1337
Print ISSN : 0916-8508

この記事には本公開記事があります。本公開記事を参照してください。
引用する場合も本公開記事を引用してください。

Consensus-Based Distributed Exp3 Policy Over Directed Time-Varying Networks
Tomoki NAKAMURANaoki HAYASHIMasahiro INUIGUCHI
著者情報
ジャーナル フリー 早期公開

論文ID: 2023MAP0008

この記事には本公開記事があります。
詳細
抄録

In this paper, we consider distributed decision-making over directed time-varying multi-agent systems. We consider an adversarial bandit problem in which a group of agents chooses an option from among multiple arms to maximize the total reward. In the proposed method, each agent cooperatively searches for the optimal arm with the highest reward by a consensus-based distributed Exp3 policy. To this end, each agent exchanges the estimation of the reward of each arm and the weight for exploitation with the nearby agents on the network. To unify the explored information of arms, each agent mixes the estimation and the weight of the nearby agents with their own values by a consensus dynamics. Then, each agent updates the probability distribution of arms by combining the Hedge algorithm and the uniform search. We show that the sublinearity of a pseudo-regret can be achieved by appropriately setting the parameters of the distributed Exp3 policy.

著者関連情報
© 2023 The Institute of Electronics, Information and Communication Engineers
feedback
Top