2023 Volume 9 Issue 2 Pages A_110-A_120
This study proposes the decomposition of the action-value in Markov Decision Process (MDP) for the decentralized network-wide signal control by the multi-agent reinforcement learning. The proposed decentralized control is theoretically proved to yield the optimum signal control for an entire network. Recent trends in the reinforcement learning for the signal control are the cooperative multi-agent system to capture interactions among multiple intersections so as to establish the system optimum control. However, cooperative arrangements still do not guarantee to yield the system optimum control. This study therefore reconsiders the structure of MDP and shows that the action-value in MDP can be decomposed so that the system optimum control is obtained in the decentralized fashion in which an individual intersection can decide its own action and update its action-value only using the state and action associated with itself. The proposed decentralized control is tested in an arterial signal control, and it is confirmed that the decentralized control successfully produces the system optimum signal control. Also, issues on control parameters obtained from the reinforcement learning are summarized.