計測自動制御学会論文集
Online ISSN : 1883-8189
Print ISSN : 0453-4654
ISSN-L : 0453-4654
強化学習におけるサンプリング条件で信頼性が保証された最適政策
泉田 啓藤井 信治
著者情報
ジャーナル フリー

2008 年 44 巻 1 号 p. 71-77

詳細
抄録
The estimate accuracy of state transition probability effects an obtained policy when reinforcement learning method is applied. Therefore, it is important to know how accurately we must estimate the probability. The objective of this study is to derive a sampling condition to guarantee an optimal policy with a desired reliability. First, we describe a relation of Q-factors and estimated probabilities to induce the right optimal policy. The relation provides a required accuracy for the probability estimation and leads to sampling conditions. Additionally, we propose the method to predict the required number of sample in advance. Numerical simulations examine the effectiveness of the proposed methods.
著者関連情報
© 社団法人 計測自動制御学会
前の記事 次の記事
feedback
Top