IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532
Special Section on Foundations of Computer Science
Policy Gradient Based Semi-Markov Decision Problems: Approximation and Estimation Errors
Ngo Anh VIENSeungGwan LEETaeChoong CHUNG
著者情報
ジャーナル フリー

2010 年 E93.D 巻 2 号 p. 271-279

詳細
抄録

In [1] and [2] we have presented a simulation-based algorithm for optimizing the average reward in a parameterized continuous-time, finite-state semi-Markov Decision Process (SMDP). We approximated the gradient of the average reward. Then, a simulation-based algorithm was proposed to estimate the approximate gradient of the average reward (called GSMDP), using only a single sample path of the underlying Markov chain. GSMDP was proved to converge with probability 1. In this paper, we give bounds on the approximation and estimation errors for GSMDP algorithm. The approximation error of that approximation is the size of the difference between the true gradient and the approximate gradient. The estimation error, the size of the difference between the output of the algorithm and its asymptotic output, arises because the algorithm sees only a finite data sequence.

著者関連情報
© 2010 The Institute of Electronics, Information and Communication Engineers
前の記事 次の記事
feedback
Top