2012 Volume 48 Issue 11 Pages 790-798
We recently proposed swarm reinforcement learning methods in which multiple sets of an agent and an environment are prepared and the agents learn not only by individually performing a usual reinforcement learning method but also by exchanging information among them. The methods have been applied to problems with discrete state-action space as the first stage of the research. In the real world, however, there are many problems which are formulated as ones with continuous state-action space. This paper proposes swarm reinforcement learning methods for acquiring optimal policies rapidly for problems with continuous state-action space. The information exchange methods which we proposed for the discrete problems can not be directly applied to continuous problems in which the state space is higher-dimensional and the state value function has stronger nonlinearity. We propose new information exchange methods which can be applied to such continuous problems. The proposed swarm reinforcement learning methods are applied to a biped robot control problem, and their performance is examined through numerical experiments.