非定常多腕バンディットゲームと集合知効果

吉田 俊介; 久門 正人; 守 真太郎

doi:10.1527/tjsai.30-6_JWEIN-B

Abstract

We define the swarm intelligence effect and obtain the condition for the emergence of it in an interactive game of restless multi-armed bandit where a player competes with multiple agents. Each arm in the bandit has a payoff which change with probability p_c per round. Agents and a player choose one from three options: (1) Exploit (exploiting a good arm), (2) Innovate (asocial exploring for good arms), and (3) Observe (social exploring for good arms). Each agent has two parameters (c,p_obs) to specify the decision: (i) c, the threshold value for Exploit. If the agent knows only arms whose payoffs are less than c, he chooses to explore. (ii)p_obs, the probability for Observe when the agent explores. The parameters (c,p_obs) of the agents are uniformly distributed. We introduce a scope n_I for searching good arms in Innovate to control its cost. We determine optimal strategies of player using the complete knowledge about the bandit and the information of exploited arms by agents. We show which social or asocial exploring is optimal in (p_c,n_I) space. We conduct a laboratory experiment (67 subjects). If (p_c,n_I) is chosen so that social learning is far optimal than asocial learning, we observe the swarm intelligence effect. If (p_c,n_I) is in the region where asocial learning is optimal or comparable with social learning, we do not observe the effect.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!