Abstract
As Q-Learning is formulated to solve reinforcement learning problems that are modeled as Markov decision processes (MDPs), its learning capability is limited to a task environment whose structure can be dealt with by a single MDP model. For this issue, we propose a new learning method with multiple Q-tables that are flexibly switched over according to the identified dynamics of the task environment. Our learning model consists of two different processing modules subsuming the Q-Learning algorithm: the system identification module and the adaptive state space divider. The former functions as a detector of different environmental dynamics we call "task situation". The latter autonomously regulates the effective resolution of the agent's internal state space by absorbing the distribution of experienced sensory data. This paper explains our basic ideas, and then presents some results from a preliminary experiment using a cart-pole swing-up simulation.