Host: The Japanese Society for Artificial Intelligence
Name : The 37th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 37
Location : [in Japanese]
Date : June 06, 2023 - June 09, 2023
The hippocampus is known to be the brain region that replays past experiences. In the context of deep reinforcement learning, experience replay has traditionally been used primarily to improve the sample efficiency of data used to train artificial neural networks and to maintain independence among samples. However, recent advances in neuroscience research have revealed that hippocampal replays occur prior to the onset of locomotion and involve planning that selects the optimal locomotion path from among previously experienced paths, starting from the current location. Inspired by this phenomena, we proposed a mechanism in the Deep Q-Network (DQN) framework to reflect in the current action selection previously experienced paths. This mechanism is described as follows: first, search for trajectories that start from states similar to the current state in the replay buffer that holds previously observed information. Second, reflect the n-step rewards in the past action selections by adding them to the action value of the current state. Our simulation experiments with CliffWalking confirmed that the proposed method allows the agent to maximize returns earlier and to reach the terminal state with fewer steps than normal DQN.