Host: The Japanese Society for Artificial Intelligence
Name : The 33rd Annual Conference of the Japanese Society for Artificial Intelligence, 2019
Number : 33
Location : [in Japanese]
Date : June 04, 2019 - June 07, 2019
Learning interpretable policies for control problems is important for industrial requirements for safety and maintenance. A common approach to acquiring interpretable policies is to learn a decision tree that imitates a black-box (e.g., neural network-based) policy trained to maximize the expected reward in a given environment. However, such approximated decision tree policies are suboptimal in the sense that they do not necessarily maximize the expected reward. In this paper, we propose a method for learning a decision tree policy that directly maximizes the reward using the cross-entropy method. Our experimental results show that our method can acquire interpretable decision tree policies that perform better than baseline policies learned by the imitation approach.