IEEJ Transactions on Electronics, Information and Systems
Online ISSN : 1348-8155
Print ISSN : 0385-4221
ISSN-L : 0385-4221
Hierarchical Reinforcement Learning in Partially Observable Markovian Environments
A Proposal of Switching Q-learning
Hiroyuki KamayaKenichi Abe
Author information
JOURNAL FREE ACCESS

2002 Volume 122 Issue 7 Pages 1186-1193

Details
Abstract
The most widely used reinforcement learning (RL) algorithms are limited to Markovian environments. To handle larger scale partially observable Markov decision processes, we propose a new on-line hierarchical RL algorithm, which is called Switching Q-learning (SQ-learning). The basic idea of SQ-learning is that non-Markovian tasks can be automatically decomposed into subtasks solvable by multiple policies, without any other information leading to good subgoals. To deal with such decomposition, SQ-learning employs ordered sequences of Q modules in which each module discovers a local control policy based on Sarsa (λ). Furthermore, a hierarchical structure learning automaton is used which finds appropriate subgoal sequences according to LR-I algorithm. The results of extensive simulations demonstrate the effectiveness of SQ-learning.
Content from these authors
© The Institute of Electrical Engineers of Japan
Previous article Next article
feedback
Top