抄録
Actor-critic approach and natural-gradient-based methods have recently drawn significant interests in the area of reinforcement learning, and several algorithms have been studied along the line of the natural actor-critic strategy. This paper considers the problem of improving a previously reported RLS-based natural actor-critic algorithm toward a version that employs learning rate adaptation. In the actor part of the studied algorithm, we follow the strategy of performing parameter update via the use of the natural gradient together with learning rate adaptation, while in its update for the critic part, the recursive least-squares method is utilized for estimating the advantage function and the state value function. The applicability of the studied algorithm is illustrated via locomotion of a two-linked robot arm.