Journal of the Japan Statistical Society, Japanese Issue
Online ISSN : 2189-1478
Print ISSN : 0389-5602
ISSN-L : 0389-5602
FURTHER RESULTS ON A UNIFORM TWO-ARMED BANDIT PROBLEM WITH ONE ARM KNOWN
Toshio Hamada
Author information
JOURNAL FREE ACCESS

1985 Volume 15 Issue 2 Pages 193-208

Details
Abstract
Suppose that there are two experiments e0 and e1 and by performing ei (i=0, 1), a random sample Xi is obtained from the uniform distribution on the interval [pi, qi]. The values of p1 and q1 are known a priori, but at least one of two values p0 and q0 are unknown. There is a conjugate prior distribution for the unknown parameters. Experiments are performed sequentially for n times and at each time one of two experiments is selected and performed. The expected value of sum of n observations is maximized. Two cases are considered: First case is that p0=p1=0, q1=1 and the reward is discounted by a discount factor. Another case is that both p0 and q0 are unknown. For both cases, the problem is formulated by the principle of optimality of dynamic programming, the optimal strategy is derived and critical values for the strategy are calculated.
Content from these authors
© Japan Statistical Society
Previous article Next article
feedback
Top