Journal of the Japanese Society for Artificial Intelligence
Online ISSN : 2435-8614
Print ISSN : 2188-2266
Print ISSN:0912-8085 until 2013
A Model Based Approach to Reinforcement Learning in Partially Observable Environments : Bayesian Learning of Models with Variable Memory Length
Nobuo SUEMATSUAkira HAYASHIShigang LI
Author information
MAGAZINE FREE ACCESS

1998 Volume 13 Issue 3 Pages 404-414

Details
Abstract

Most of the reinforcement learning (RL) algorithms assume that the learning processes of embedded agents can be formulated as Markov Decision Processes (MDPs). However, the assumption is not valid for many realistic problems. Therefore, research on RL techniques for non-Markovian environments is gaining more attention recently. We have developed a Bayesian approach to RL in non-Markovian environments, in which the environment is modeled as a history tree model, a stochastic model with variable memory length. In our approach, given a class of history trees, the agent explores the environment and learns the maximum a posteriori (MAP) model on the basis of Bayesian Statistics. The optimal policy can be computed by Dynamic Programming, after the agent has learned the environment model. Unlike many other model learning techniques, our approach does not suffer from the problems of noise and overfitting, thanks to the Bayesian framework. We have analyzed the asymptotic behavior of the proposed algorithm and have proved that if the given class contains the exact model of the environment, the model learned by our algorithm converges to it. We also present the results of our experiments in two non-Markovian environments.

Content from these authors
© 1998 The Japaense Society for Artificial Intelligence
Previous article Next article
feedback
Top