Proceedings of the Fuzzy System Symposium
37th Fuzzy System Symposium
Session ID : MD2-3
Conference information

proceeding
Multi-Agent Reinforcement Learning by a Policy Gradient Method: Policy Function Approximation by a Boltzmann Machine
*Seiji ISHIHARAHarukazu IGARASHI
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

Policy gradient methods such as REINFORCE algorithm, which express the gradient function of expected rewards without using value functions, need not assume that policies of agents and environmental models of rewards and state transition probabilities have Markov property when applied to multi-agent systems. One of the policy gradient methods of this kind uses an objective function, which aims to be minimized in order to determine an action, as an energy function of Boltzmann selection corresponding to the policy. It has been shown that the objective function can be flexibly constructed. On the other hand, reinforcement learning in a multi-agent system has the state-explosion problem. As one of the effective measures to avoid the problem, a method of approximating the value function with a Boltzmann machine has been proposed. In this paper, we first propose a policy gradient method that approximates the objective function in the policy expressed by Boltzmann selection with the energy of the Boltzmann machine. Second, we propose a more efficient method that approximates the objective function with the energy of a modular structured restrictive Boltzmann machine. As a result of the experiment to a pursuit problem, it was possible to learn appropriate policies with a small number of parameters by both proposed methods. Furthermore, the second proposed method managed to significantly reduce the computational cost required for the learning compared to the first one.

Content from these authors
© 2021 Japan Society for Fuzzy Theory and Intelligent Informatics
Previous article Next article
feedback
Top