抄録
Most algorithms for reinforcement learning face difficulty in achieving optimal performance when the state of the environment is not completely known. The authors have proposed a method for overcoming this problem by using recurrent neural networks in a learning agent. In this paper, we discuss the implementation of the proposed method using several types of network architecture and supervised learning algorithms. Further, the internal representation of the environment acquired in the learning agent is examined using a technique of cluster analysis. The results show that the learning agent achieves optimal performance in reinforcement learning tasks by constructing an accurate internal model, despite incomplete perception of the state of the environment.