2019 Volume 139 Issue 7 Pages 847-848
In this paper, we extend the proposed reinforcement learning with multiplex learning space to an environment that needs delay time for getting rewards. Concreatly, we prepare the multiplex learning spaces corresponding to each equal interval delay time within the predicted range. We simulated it, comparing with an ordinary one. As a result, the proposed method could get the best policy, but the ordinary method could not.
The transactions of the Institute of Electrical Engineers of Japan.C
The Journal of the Institute of Electrical Engineers of Japan