Abstract
Suppose that there are two experiments e0 and e1 and by performing ei (i=0, 1), a random sample Xi is obtained from the uniform distribution on the interval [pi, qi]. The values of p1 and q1 are known a priori, but at least one of two values p0 and q0 are unknown. There is a conjugate prior distribution for the unknown parameters. Experiments are performed sequentially for n times and at each time one of two experiments is selected and performed. The expected value of sum of n observations is maximized. Two cases are considered: First case is that p0=p1=0, q1=1 and the reward is discounted by a discount factor. Another case is that both p0 and q0 are unknown. For both cases, the problem is formulated by the principle of optimality of dynamic programming, the optimal strategy is derived and critical values for the strategy are calculated.