計測自動制御学会論文集
Online ISSN : 1883-8189
Print ISSN : 0453-4654
ISSN-L : 0453-4654
システム・情報
強化学習における最適政策の信頼性を保証するサンプリング政策
泉田 啓岩崎 祥充藤井 信治
著者情報
ジャーナル フリー

2010 年 46 巻 5 号 p. 274-280

詳細
抄録
This study defines the certification sampling that guarantees with specified reliability the optimal policy being correct to the real transition probability, where the optimal policy was derived from a estimated probability. It then discusses the sampling policy as follows that efficiently obtains the certification sampling. The the transition probability is estimated by sampling, and it leads the optimal policy. On the other hand, it calculates the desired accuracy of the estimated transition probability that is necessary to guarantee the correct optimal policy. This study proposes the sampling policy that efficiently achieves the certification sampling with the desired accuracy of the estimated transition probability. The proposed method is efficient in number of samples because it automatically selects states and actions to be sampled and stops sampling when the condition is satisfied.
著者関連情報
© 2010 公益社団法人 計測自動制御学会
前の記事 次の記事
feedback
Top