2012 年 27 巻 2 号 p. 92-102
Future robots/agents will perform situated behaviors for each user. Flexible behavioral learning is required for coping with diverse and unexpected users' situations. Unexpected situations are usually not tractable for machine learning systems that are designed for pre-defined problems. In order to realize such a flexible learning system, we were trying to create a learning model that can function in several different kinds of state transitions without specific adjustments for each transition as a first step. We constructed a modular neural network model based on reinforcement learning. We expected that combining a modular architecture with neural networks could accelerate the learning speed of neural networks. The inputs of our neural network model always include not only observed states but also memory information for any transition. In pure Markov decision processes, memory information is not necessary, rather it can lead to lower performance. On the other hand, partially observable conditions require memory information to select proper actions. We demonstrated that the new learning model could actually learn those multiple kinds of state transitions with the same architectures and parameters, and without pre-designed models of environments. This paper describes the performances of constructed models using probabilistically fluctuated Markov decision processes including partially observable conditions. In the test transitions, the observed state probabilistically fluctuated. The new learning model could function in those complex transitions. In addition, the learning speeds of our model are comparable to a reinforcement learning algorithm implemented with a pre-defined and optimized table-representation of states.