The most widely used reinforcement learning (RL) algorithms are limited to Markovian environments. To handle larger scale partially observable Markov decision processes, we propose a new on-line hierarchical RL algorithm, which is called Switching
Q-learning (
SQ-learning). The basic idea of
SQ-learning is that
non-Markovian tasks can be automatically decomposed into subtasks solvable by multiple policies, without any other information leading to
good subgoals. To deal with such decomposition,
SQ-learning employs ordered sequences of
Q modules in which each module discovers a local control policy based on Sarsa (λ). Furthermore, a hierarchical structure learning automaton is used which finds appropriate subgoal sequences according to
LR-I algorithm. The results of extensive simulations demonstrate the effectiveness of
SQ-learning.
抄録全体を表示