Host: The Japanese Society for Artificial Intelligence
Name : The 36th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 36
Location : [in Japanese]
Date : June 14, 2022 - June 17, 2022
The objective of this study is to improve Multi Objective Deep Reinforcement Learning (MODRL) for optimizing crowd guidance strategies. In general, MODRL is classified into Outer-loop method and Inner-loop method. In the former, multiple objective functions are transformed into a single objective using a scalarization function, and the Pareto front, which is the optimal solution set, is obtained by repeatedly updating the weights of the scalarization function and performing single-objective optimization. However, in this method, if the computational cost of single-objective optimization is high, the overall computational cost increases in proportion to the number of times the weights update. On the other hand, the latter the Inner-loop method is designed to learn Pareto front in a learning process. In this study, we examine the approximation of the Pareto solution by different action selection methods of Pareto-DQN, which is a typical method of the Inner-loop method. In the experiments, we evaluate the proposed method using a benchmark problem and finally discuss its application to the optimization of crowd guidance strategies.