2014 Volume 5 Issue 2 Pages 198-209
The “tug-of-war (TOW) model” is a unique parallel search algorithm for solving the multi-armed bandit problem (BP), which was inspired by the photoavoidance behavior of a single-celled amoeboid organism, the true slime mold Physarum polycephalum [1-4]. “The cognitive medium access (CMA) problem,” which refers to multiuser channel allocations of the cognitive radio, can be interpreted as a “competitive multi-armed bandit problem (CBP) [5, 6].” Unlike the normal BP, the CBP considers a competitive situation in which more than one user selects a channel whose reward probability (probability of which channel is free) varies depending on the number and combination of the selecting users as indicated in a payoff matrix. Depending on the payoff matrix, the CBP provides a hard problem instance in which the users should not be attracted to the Nash equilibrium to achieve the “social maximum,” which is the most desirable state to obtain the maximum total score (throughput) for all the users. In this study, we propose two variants of the TOW model (solid type and liquid type) for the CBP toward developing a CMA protocol using a distributed control in uncertain environments. Using the minimum CBP cases where both the users choose a channel from the two considered channels, we show that the performance of our solid-type TOW model is better than that of the well-known upper confidence bound 1 (UCB1)-tuned algorithm, particularly for the hard problem instances. The aim of this study is to explore how the users can achieve the social maximum in a decentralized manner. We also show that our liquid-type TOW model, which introduces direct interactions among the users for avoiding mutual collisions, makes it possible to achieve the social maximum for general CBP instances.