There are two kinds of experiments
e0 and
e1, and by performing
e0 or
e1, an observation is obtained from the exponential distribution with a parameter 1 or
u, respectively. Although the true value of
u is unknown, u has a gamma distribution as the prior distribution. The action
ai (
i=0, 1) is defined to select
ei, and perform it simultaneously
m times. An
n-stage sequential decision problem, in which
a0 or
a1 is selected at each stage by considering the information obtained up to that stage in order to maximize the expected sum of
mn observations, is constructed and formulated by dynamic programming and the optimal strategy is obtained.
The results of this paper illustrate how to calculate the critical value which has an important role in the optimal strategy numerically. The results also gives the optimal strategy for the integer-valued parameter case of the gamma two-armed bandit problem with one arm known.
抄録全体を表示