Temporal Difference学習による収益系列の予測符号化

岩田 一貴; 池田 和司; 酒井 英昭

doi:10.11509/sci.SCI03.0.6013.0

抄録

We regard the sequence of returns as outputs from a parametric compound source. Utilizing the fact that the coding rate of the source shows the amount of information about the return, we describe l-learning algorithms based on the predictive coding idea for estimating an expected information gain concerning future information. Using the information gain, we propose the ratio w of return loss to information gain as a new criterion to be used in probabilistic action selection strategies. In experimental results, we found our w-based strategy performs well compared with the conventional Q-based strategy.

著者関連情報

お気に入り & アラート

お気に入りに追加
追加情報アラート
被引用アラート
認証解除アラート

閲覧履歴

Automatic Selection and Concatenation System for Jazz Piano Trio Using Case Data
会議報告：The Web Conference 2024（WWW 2024）／ 2024 IEEE International Conference on Robotics and Automation（ICRA 2024）
An Experimental Study on Koopman Modeling of an AC-DC Converter
摘出脳のMR画像と病理画像とのレジストレーション
左心補助人工心臓(LVAD)駆動時の肝、腎循環に関する実験的検討

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）