AN EXPONENTIAL TWO-ARMED BANDIT PROBLEM WITH ONE ARM KNOWN UNDER BATCH SAMPLING

Toshio Hamada

doi:10.14490/jjss1995.25.205

Abstract

There are two kinds of experiments e₀ and e₁, and by performing e₀ or e₁, an observation is obtained from the exponential distribution with a parameter 1 or u, respectively. Although the true value of u is unknown, u has a gamma distribution as the prior distribution. The action a_i (i=0, 1) is defined to select e_i, and perform it simultaneously m times. An n-stage sequential decision problem, in which a₀ or a₁ is selected at each stage by considering the information obtained up to that stage in order to maximize the expected sum of mn observations, is constructed and formulated by dynamic programming and the optimal strategy is obtained.
The results of this paper illustrate how to calculate the critical value which has an important role in the optimal strategy numerically. The results also gives the optimal strategy for the integer-valued parameter case of the gamma two-armed bandit problem with one arm known.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!