Journal of the Japan Statistical Society, Japanese Issue
Online ISSN : 2189-1478
Print ISSN : 0389-5602
ISSN-L : 0389-5602
ON A UNIFORM TWO-ARMED BANDIT PROBLEM
Toshio Hamada
Author information
JOURNAL FREE ACCESS

1984 Volume 14 Issue 2 Pages 179-187

Details
Abstract
Suppose that there are two experiments e0 and e1, and by performing e0 or e1, a random sample X or Y is obtained from the uniform distribution on the interval [0, p] or [0, q], respectively. The true values of p and q are unknown, but there is the prior knowledge that p and q have Pareto distributions as prior distributions. When x is obtained, the reward is x. Experiments can be made sequentially for n times, and at each time one of two experiments may be selected and performed. The objective is to maximize the total expected reward. This problem is formulated by dynamic programming and analyzed. It is found that there exists the function which describes the optimal strategy. Some properties of this function are derived.
Content from these authors
© Japan Statistical Society
Previous article Next article
feedback
Top