2026 年 30 巻 1 号 p. 301-310
Recently, deep reinforcement learning has demonstrated notable success across various tasks. This success is largely attributable to its high expressive power. However, operating large-scale neural networks requires substantial power consumption. This is particularly a challenge for applications with limited power budgets, such as robotic control, in which energy efficiency is paramount. By contrast, spiking neural networks (SNNs) have garnered considerable attention owing to their high energy efficiency, particularly when implemented on dedicated neuromorphic hardware. Despite these advantages, conventional methods for integrating SNNs into deep reinforcement learning frameworks frequently struggle with training stability. To address this issue, this study introduces a novel algorithm that incorporates SNNs within the actor networks of a twin-delayed deep deterministic policy gradient architecture. To further enhance the performance, a burn-in strategy inspired by recurrent experience replay in distributed reinforcement learning was implemented. This study introduces a burn-in strategy that stabilizes learning and reduces variance by addressing the issue of stale membrane potentials stored in the replay buffer. In this strategy, membrane potentials computed using outdated network parameters are passed through the current network to align them with the updated parameters, to improve the accuracy of action value estimations and strengthen training stability. Furthermore, loss-adjusted prioritization was incorporated to improve the learning efficiency and stability. Experimental evaluations conducted in OpenAI Gym environments demonstrated that the proposed method yielded superior rewards compared with conventional approaches.
この記事は最新の被引用情報を取得できません。