For multiagent environments, a centralized reinforcement learner can find optimal policies, but it is time-consuming. A method is proposed for finding the optimal policies acceleratingly, and it uses the centralized learner in combination with supplemental independent learners. In order to prevent the failure of learning, the independent learners must stop in a timely manner, which is done through finely tuning a reward. The reward tuning, however, requires additional time and effort. This paper proposes a reinforcement learning method in which the reward is automatically set.
A distributed welfare game is a game-theoretic model for a resource allocation problem which is to find an allocation to maximize the objective function of the system operator. In order to determine an allocation in a distributed way, each agent is assigned to an admissible utility function such that the resulting game possesses desirable properties, for example, scalability, the efficiency of pure Nash equilibria, and budget balance. For this end, a marginal contribution-based utility design is proposed. This utility function requires less computational effort than the previous research, while it has the same efficiency as those of the conventional utility design via Shapley value.