2009 年 27 巻 3 号 p. 350-357
Existing reinforcement learning approaches have been suffering from policy alternation by others in multi-agent dynamic environments that may cause sudden changes in state transition probabilities of which constancy is needed for behavior learning to converge. A typical example is the case of RoboCup competitions because behaviors of other agents may change the state transition probabilities. A modular learning system would be able to solve this problem if we can assign each module to one situation in which the module can regard the state transition probabilities as constant. Scheduling for learning is introduced to avoid the complexity in autonomous situation assignment. Furthermore, introduction of macro actions reduces the exploration space and it would enable agents to learn competitive behaviors simulaneously in such an adversary environment. This paper presents a method of modular learning in a multi-agent environment in which the learning agents can learn their behaviors and adapt themselves to the resultant situations by the others’ behaviors.