群集誘導戦略最適化に向けた多目的深層強化学習に関する研究

西田 遼; 谷垣 勇輝; 大西 正輝; 橋本 浩一

doi:10.11517/pjsai.JSAI2022.0_3G4OS15b01

Abstract

The objective of this study is to improve Multi Objective Deep Reinforcement Learning (MODRL) for optimizing crowd guidance strategies. In general, MODRL is classified into Outer-loop method and Inner-loop method. In the former, multiple objective functions are transformed into a single objective using a scalarization function, and the Pareto front, which is the optimal solution set, is obtained by repeatedly updating the weights of the scalarization function and performing single-objective optimization. However, in this method, if the computational cost of single-objective optimization is high, the overall computational cost increases in proportion to the number of times the weights update. On the other hand, the latter the Inner-loop method is designed to learn Pareto front in a learning process. In this study, we examine the approximation of the Pareto solution by different action selection methods of Pareto-DQN, which is a typical method of the Inner-loop method. In the experiments, we evaluate the proposed method using a benchmark problem and finally discuss its application to the optimization of crowd guidance strategies.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!