Host: The Japanese Society for Artificial Intelligence
Name : The 35th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 35
Location : [in Japanese]
Date : June 08, 2021 - June 11, 2021
Reinforcement learning algorithms are categorized into model-based methods, which explicitly estimate an environmental model and a reward function, and model-free methods, which directly learn a policy from real or generated experiences. We have proposed the parallel reinforcement learning algorithm for training multiple model-free and model-based reinforcement learners. The experimental results show a simple algorithm can contribute to complex algorithms' learning. However, since each learner's computation time was not considered, we could not fully demonstrate the advantage of using a simple model-free reinforcement learner. This paper proposes an asynchronous parallel reinforcement learning method that considers the differences in control frequencies. The main contribution is separating the replay buffers collected by each learner and transforming the experience replay buffer to absorb the difference in control frequencies. The proposed method is applied to benchmark problems and compared with the case without considering the difference in control frequencies. The results show that the proposed algorithm selected the simple model-based method with a short control frequency in the early stage of learning, the complex model-based method in the middle stage of learning, and the model-free method in the late learning stage.