2020 Volume 140 Issue 2 Pages 242-248
For multiagent environments, a centralized reinforcement learner can find optimal policies, but it is time-consuming. A method is proposed for finding the optimal policies acceleratingly. The method basically uses the centralized learner and supplementarily uses independent learners in the former phase. The independent learners transfer their learning results to the centralized learner, but excessive transfers cause the failure of learning. Therefore the independent learners should stop according to an appropriate condition. However, it is difficult for this method to find optimal policies for environments in which initial states are far from termination states. In order to find the optimal policies acceleratingly for such environments, this paper proposes multiagent reinforcement learning methods introducing new stop conditions.
The transactions of the Institute of Electrical Engineers of Japan.C
The Journal of the Institute of Electrical Engineers of Japan