Hierarchical Reinforcement Learning in Partially Observable Markovian Environments A Proposal of Switching <i>Q</i>-learning

Hiroyuki Kamaya; Kenichi Abe

doi:10.1541/ieejeiss1987.122.7_1186

Abstract

The most widely used reinforcement learning (RL) algorithms are limited to Markovian environments. To handle larger scale partially observable Markov decision processes, we propose a new on-line hierarchical RL algorithm, which is called Switching Q-learning (SQ-learning). The basic idea of SQ-learning is that non-Markovian tasks can be automatically decomposed into subtasks solvable by multiple policies, without any other information leading to good subgoals. To deal with such decomposition, SQ-learning employs ordered sequences of Q modules in which each module discovers a local control policy based on Sarsa (λ). Furthermore, a hierarchical structure learning automaton is used which finds appropriate subgoal sequences according to L_R-I algorithm. The results of extensive simulations demonstrate the effectiveness of SQ-learning.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!