Suppose that there are two experiments
e0 and
e1 and by performing
ei (
i=0, 1), a random sample
Xi is obtained from the uniform distribution on the interval [
pi,
qi]. The values of
p1 and
q1 are known a priori, but at least one of two values
p0 and
q0 are unknown. There is a conjugate prior distribution for the unknown parameters. Experiments are performed sequentially for
n times and at each time one of two experiments is selected and performed. The expected value of sum of
n observations is maximized. Two cases are considered: First case is that
p0=
p1=0,
q1=1 and the reward is discounted by a discount factor. Another case is that both
p0 and
q0 are unknown. For both cases, the problem is formulated by the principle of optimality of dynamic programming, the optimal strategy is derived and critical values for the strategy are calculated.
抄録全体を表示