The Optimal Policy Reliability-guaranteed by Sampling Condition in Reinforcement Learning

Kei SENDA; Shinji FUJII

doi:10.9746/ve.sicetr1965.44.71

抄録

The estimate accuracy of state transition probability effects an obtained policy when reinforcement learning method is applied. Therefore, it is important to know how accurately we must estimate the probability. The objective of this study is to derive a sampling condition to guarantee an optimal policy with a desired reliability. First, we describe a relation of Q-factors and estimated probabilities to induce the right optimal policy. The relation provides a required accuracy for the probability estimation and leads to sampling conditions. Additionally, we propose the method to predict the required number of sample in advance. Numerical simulations examine the effectiveness of the proposed methods.

著者関連情報

お気に入り & アラート

お気に入りに追加
追加情報アラート
被引用アラート
認証解除アラート

閲覧履歴

Thermal Environment and Subjective Responses of Patients and Staff in a Hospital during Winter
女性の経腹超音波画像を用いた腹直筋収縮自動評価システムのための特徴点自動追跡法の開発
Education which brings up an entrepreneur
[title in Japanese]
宇治茶のフードシステム

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）