2023 Volume 18 Issue 8 Pages 884-894
The authors used a data-driven reinforcement learning model for the post-disaster rapid recovery of human mobility, considering human-mobility recovery rate, road connectivity, and travel cost as the recovery components, to generate the reward framework. Each component has relative importance with respect to the others. However, if the preference is different from the original one, the optimal policy may not always be identified. This limitation must be addressed to enhance the robustness and generalizability of the proposed deep Q-network model. Therefore, a set of optimal policies were identified over a predetermined preference space, and the underlying importance was evaluated by applying envelope multi-objective reinforcement learning. The agent used in this study could distinguish the importance of each damaged road based on a given relative preference and derive a road-recovery policy suitable for each criterion. Furthermore, the authors provided the guidelines for constructing the optimal road-management plan. Based on the generalized policy network, the government can access diverse restoration strategies and select the most appropriate one depending on the disaster situation.
This article cannot obtain the latest cited-by information.