抄録
This paper proposes a new reinforcement learning method to construct agents in environments with cyclic reward depending on time. The proposed method consists of two parts: (a) a cyclic action-value function by superposing sinusoidal action-value function in phasor representation and (b) an algorithm to use it. Reinforcement learning is a widely used framework to develop agent which can decide suitable action. It enables the agent to learn suitable action only in stationary environments. Contrast to conventional methods, the proposed reinforcement learning method can be applied to learning in environments with cyclic reward depending on the time. Experimental results show that the proposed method performs much better than conventional methods.