自然勾配を用いたポートハミルトン系のための強化学習の高速化

福永 修一; 岩本 有生

doi:10.9746/sicetr.59.70

Abstract

In this paper, we accelerated a reinforcement learning algorithm for port-Hamiltonian systems using a natural gradient method. The proposed algorithm consists of an actor-critic structure wherein the actor generates a control input according to a policy and learns the policy using a temporal difference (TD) error, and the critic computes the TD error and learns a state-value function. Furthermore, the reinforcement learning algorithm for port-Hamiltonian systems has two types of the policy parameters which the proposed algorithm learns using the natural gradient method. Additionally, the proposed method was applied to the problem of swing-up control for an inverted pendulum through numerical simulation. The simulation result showed that the learning speed of the proposed method was higher than that of the existing method.

Content from these authors

Favorites & Alerts

Add to favorites
Additional info alert
Citation alert
Authentication alert

Corresponding author

Register with J-STAGE for free!