Temporal Difference学習による収益系列の予測符号化

岩田 一貴; 池田 和司; 酒井 英昭

doi:10.11509/sci.SCI03.0.6013.0

Abstract

We regard the sequence of returns as outputs from a parametric compound source. Utilizing the fact that the coding rate of the source shows the amount of information about the return, we describe l-learning algorithms based on the predictive coding idea for estimating an expected information gain concerning future information. Using the information gain, we propose the ratio w of return loss to information gain as a new criterion to be used in probabilistic action selection strategies. In experimental results, we found our w-based strategy performs well compared with the conventional Q-based strategy.

Content from these authors

Favorites & Alerts

Add to favorites
Additional info alert
Citation alert
Authentication alert

Corresponding author

Register with J-STAGE for free!