Transactions of the Society of Instrument and Control Engineers
Online ISSN : 1883-8189
Print ISSN : 0453-4654
ISSN-L : 0453-4654
Systems and Information
Sampling Policy that Guarantees Reliability of Optimal Policy in Reinforcement Learning
Kei SENDAYoshimitsu IWASAKIShinji FUJII
Author information
JOURNAL FREE ACCESS

2010 Volume 46 Issue 5 Pages 274-280

Details
Abstract
This study defines the certification sampling that guarantees with specified reliability the optimal policy being correct to the real transition probability, where the optimal policy was derived from a estimated probability. It then discusses the sampling policy as follows that efficiently obtains the certification sampling. The the transition probability is estimated by sampling, and it leads the optimal policy. On the other hand, it calculates the desired accuracy of the estimated transition probability that is necessary to guarantee the correct optimal policy. This study proposes the sampling policy that efficiently achieves the certification sampling with the desired accuracy of the estimated transition probability. The proposed method is efficient in number of samples because it automatically selects states and actions to be sampled and stops sampling when the condition is satisfied.
Content from these authors
© 2010 The Society of Instrument and Control Engineers
Previous article Next article
feedback
Top