Identification of the Strongest Die in Dueling Bandits

Shang LU; Kohei HATANO; Shuji KIJIMA; Eiji TAKIMOTO

doi:10.1587/transfun.2024EAP1078

Abstract

This work introduces the dueling dice problem, which is a variant of the multi-armed dueling bandit problem. A die is a set of m arms in this problem, and the goal is to find the best set of m arms from n arms (m ≤ n) by an iteration of dueling dice. In a round, the learner arbitrarily chooses two dice α ⊆ [n] and β ⊆ [n] and lets them duel, where she roles dice α and β, observes a pair of arms i ∈ α and j ∈ β, and receives a probabilistic result X_i,j ∈ {0, 1}. This paper investigates the sample complexity of an identification of the Condorcet winner die, and gives an upper bound O(nh^-2(log log h^-1 + log nm²γ^-1)m log m) where h is a gap parameter and γ is an error parameter. Our problem is closely related to the dueling teams problem by Cohen et al. 2021. We assume a total order of the strength over arms similarly to Cohen et al. 2021, which ensures the existence of the Condorcet winner die, but we do not assume a total order of the strength over dice unlike Cohen et al. 2021.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!