Journal of the Japanese Society for Artificial Intelligence
Online ISSN : 2435-8614
Print ISSN : 2188-2266
Print ISSN:0912-8085 until 2013
A Logarithmic-Time Updating Algorithm for TD(λ)Learning
Susumu KATAYAMAShigenobu KOBAYASHI
Author information
MAGAZINE FREE ACCESS

1999 Volume 14 Issue 5 Pages 879-890

Details
Abstract

Temporal-difference (TD) method is an incremental learning method for long term predictionproblem. Most reinforcement learning methods are based on it. So as to cope with partial observability, we have to combine it with the idea of eligibility traces, which causes the matter of time complexity. There are some conventional ways to reduce it, which are unavailable in environments where there may be long delay between observations and their conseqtuent rewards. In this paper we propose an algorithm which acctrrately computes TD (λ) updating in logarithmic time. It can safely be used for all kinds of environments, because it is proved to give the accurate TD prediction. We also apply our algorithm to Sarsa (λ), which is a reinforcement learning method using eligibility traces. We can also apply it to Q (λ)-learnings. The accumulating Sarsa (λ) usually takes time linear in the number of the actions for action selection. There exists two definitions of replacing Sarsa (λ), the more common and better one of which can be computed in time logarithmic in the number of the observations and that of the actions, owing to a device.

Content from these authors
© 1999 The Japaense Society for Artificial Intelligence
Previous article Next article
feedback
Top