Host: Japan Society for Fuzzy Theory and Intelligent Informatics (SOFT)
In the Multi-Agent Reinforcement Learning, the learning time and the number of trials increase in proportion to the view width and square of the number of the agents according to the state of the perceptive environment. In this regards, as the means to improve performance by reducing the number of the states, a method of coarse-graining of perceptive information has been proposed. This method has achieved faster learning, however the accuracy of behavior selection decreases, which is the indicator to evaluate learning outcomes. In this report, based on Parallel Learning and Average Residual Entropy, we propose a method of coarse-graining adaptively according to the progress of the learning. This achieved both faster learning and maintaining the accuracy of behavior selection.