深層強化学習における類似経験による行動プランニング

越川 駿平; 久米 淳; 樋口 滉規; 高橋 達二; 太田 宏之

doi:10.11517/pjsai.JSAI2023.0_1B4GS205

Abstract

The hippocampus is known to be the brain region that replays past experiences. In the context of deep reinforcement learning, experience replay has traditionally been used primarily to improve the sample efficiency of data used to train artificial neural networks and to maintain independence among samples. However, recent advances in neuroscience research have revealed that hippocampal replays occur prior to the onset of locomotion and involve planning that selects the optimal locomotion path from among previously experienced paths, starting from the current location. Inspired by this phenomena, we proposed a mechanism in the Deep Q-Network (DQN) framework to reflect in the current action selection previously experienced paths. This mechanism is described as follows: first, search for trajectories that start from states similar to the current state in the replay buffer that holds previously observed information. Second, reflect the n-step rewards in the past action selections by adding them to the action value of the current state. Our simulation experiments with CliffWalking confirmed that the proposed method allows the agent to maximize returns earlier and to reach the terminal state with fewer steps than normal DQN.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!