Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
37th (2023)
Session ID : 1B4-GS-2-05
Conference information

Deep reinforcement learning with planning based on replay of similar experiences
*Shunpei KOSHIKAWAJun KUMEKoki HIGUCHITatsuji TAKAHASHIHiroyuki OHTA
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

The hippocampus is known to be the brain region that replays past experiences. In the context of deep reinforcement learning, experience replay has traditionally been used primarily to improve the sample efficiency of data used to train artificial neural networks and to maintain independence among samples. However, recent advances in neuroscience research have revealed that hippocampal replays occur prior to the onset of locomotion and involve planning that selects the optimal locomotion path from among previously experienced paths, starting from the current location. Inspired by this phenomena, we proposed a mechanism in the Deep Q-Network (DQN) framework to reflect in the current action selection previously experienced paths. This mechanism is described as follows: first, search for trajectories that start from states similar to the current state in the replay buffer that holds previously observed information. Second, reflect the n-step rewards in the past action selections by adding them to the action value of the current state. Our simulation experiments with CliffWalking confirmed that the proposed method allows the agent to maximize returns earlier and to reach the terminal state with fewer steps than normal DQN.

Content from these authors
© 2023 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top