Host: The Japan Society of Mechanical Engineers
Name : [in Japanese]
Date : June 01, 2022 - June 04, 2022
Reinforcement learning (RL) in safety critical domains like autonomous driving requires safe exploration, but conventional safe reinforcement learning methods have low performance at the initial stage of learning. In this paper, we present a RL method that selects action using independent Q-functions on rule-based policy and RL policy to address this issue. In our method, Q-function on rule-based policy is pre-trained using offline data and Q-function on RL policy is initialized randomly. Our method selects action by comparing both Q-functions and increases probability of selecting rule-based action at initial stage of learning. We conduct experiments on driving lane selection task, and find that our approach can improve performance at the initial stage of learning.