決定木とCross-Entropy法を用いた解釈可能な制御方策の学習

田中 友紀子; 平岡 拓也; 鶴岡 慶雅

doi:10.11517/pjsai.JSAI2019.0_3Rin208

Abstract

Learning interpretable policies for control problems is important for industrial requirements for safety and maintenance. A common approach to acquiring interpretable policies is to learn a decision tree that imitates a black-box (e.g., neural network-based) policy trained to maximize the expected reward in a given environment. However, such approximated decision tree policies are suboptimal in the sense that they do not necessarily maximize the expected reward. In this paper, we propose a method for learning a decision tree policy that directly maximizes the reward using the cross-entropy method. Our experimental results show that our method can acquire interpretable decision tree policies that perform better than baseline policies learned by the imitation approach.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!