JOURNAL OF THE JAPAN STATISTICAL SOCIETY
Online ISSN : 1348-6365
Print ISSN : 1882-2754
ISSN-L : 1348-6365
AN EXPONENTIAL TWO-ARMED BANDIT PROBLEM WITH ONE ARM KNOWN UNDER BATCH SAMPLING
Toshio Hamada
Author information
JOURNAL FREE ACCESS

1995 Volume 25 Issue 2 Pages 205-216

Details
Abstract
There are two kinds of experiments e0 and e1, and by performing e0 or e1, an observation is obtained from the exponential distribution with a parameter 1 or u, respectively. Although the true value of u is unknown, u has a gamma distribution as the prior distribution. The action ai (i=0, 1) is defined to select ei, and perform it simultaneously m times. An n-stage sequential decision problem, in which a0 or a1 is selected at each stage by considering the information obtained up to that stage in order to maximize the expected sum of mn observations, is constructed and formulated by dynamic programming and the optimal strategy is obtained.
The results of this paper illustrate how to calculate the critical value which has an important role in the optimal strategy numerically. The results also gives the optimal strategy for the integer-valued parameter case of the gamma two-armed bandit problem with one arm known.
Content from these authors
© Japan Statistical Society
Previous article
feedback
Top