Sampling Policy that Guarantees Reliability of Optimal Policy in Reinforcement Learning

Kei SENDA; Yoshimitsu IWASAKI; Shinji FUJII

doi:10.9746/sicetr.46.274

抄録

This study defines the certification sampling that guarantees with specified reliability the optimal policy being correct to the real transition probability, where the optimal policy was derived from a estimated probability. It then discusses the sampling policy as follows that efficiently obtains the certification sampling. The the transition probability is estimated by sampling, and it leads the optimal policy. On the other hand, it calculates the desired accuracy of the estimated transition probability that is necessary to guarantee the correct optimal policy. This study proposes the sampling policy that efficiently achieves the certification sampling with the desired accuracy of the estimated transition probability. The proposed method is efficient in number of samples because it automatically selects states and actions to be sampled and stops sampling when the condition is satisfied.

著者関連情報

お気に入り & アラート

閲覧履歴

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）