抄録
The estimate accuracy of state transition probability effects an obtained policy when reinforcement learning method is applied. Therefore, it is important to know how accurately we must estimate the probability. The objective of this study is to derive a sampling condition to guarantee an optimal policy with a desired reliability. First, we describe a relation of Q-factors and estimated probabilities to induce the right optimal policy. The relation provides a required accuracy for the probability estimation and leads to sampling conditions. Additionally, we propose the method to predict the required number of sample in advance. Numerical simulations examine the effectiveness of the proposed methods.