Host: The Japanese Society for Artificial Intelligence
Name : 34th Annual Conference, 2020
Number : 34
Location : Online
Date : June 09, 2020 - June 12, 2020
In reinforcement learning, policy is normally learned in simulation environments and is applied to the real world for cost and safety reasons. However, the learned policy cannot often adapt because real world disturbances and failures cause gaps between the two environments. In order to narrow such gap, the policy that is able to adapt to various scenarios are needed. In this paper we propose a reinforcement learning method for acquiring a robust policy against failures. In the proposed method, the failure is represented by adjusting the physical parameters of the robot. Reinforcement learning under various faults is made by randomizing the physical parameters during learning. In experiments, we show that the robot learned with the proposed method has higher average rewards than a normal robot for quadruped walking task in a simulation environment with/without robot failures.