Abstract
In reinforcement learning problems, agents should learn from only rewards that are provided by the environment; hence, learning by trial and error is inevitable. In order to acquire right policies of actions, action-value functions are often estimated. In many cases, the action-value functions are approximated by parametric linear/nonlinear functions such as RBF networks. However, when the RBF networks are trained in incremental fashion, we often suffer from a serious problem called interference that results in the forgetting of input-output relations acquired in the past. In this work, we propose a new approach to learning action-value functions using an RBF network with memory mechanism. In the simulations, we verify that our proposed model can acquire the proper policies even in difficult situations.