2023 Volume E106.D Issue 6 Pages 1130-1141
The increasing attention to the interpretability of machine learning models has led to the development of methods to explain the behavior of black-box models in a post-hoc manner. However, such post-hoc approaches generate a new explanation for every new input, and these explanations cannot be checked by humans in advance. A method that selects decision rules from a finite ruleset as explanation for neural networks has been proposed, but it cannot be used for other models. In this paper, we propose a model-agnostic explanation method to find a pre-verifiable finite ruleset from which a decision rule is selected to support every prediction made by a given black-box model. First, we define an explanation model that selects the rule, from a ruleset, that gives the closest prediction; this rule works as an alternative explanation or supportive evidence for the prediction of a black-box model. The ruleset should have high coverage to give close predictions for future inputs, but it should also be small enough to be checkable by humans in advance. However, minimizing the ruleset while keeping high coverage leads to a computationally hard combinatorial problem. Hence, we show that this problem can be reduced to a weighted MaxSAT problem composed only of Horn clauses, which can be efficiently solved with modern solvers. Experimental results showed that our method found small rulesets such that the rules selected from them can achieve higher accuracy for structured data as compared to the existing method using rulesets of almost the same size. We also experimentally compared the proposed method with two purely rule-based models, CORELS and defragTrees. Furthermore, we examine rulesets constructed for real datasets and discuss the characteristics of the proposed method from different viewpoints including interpretability, limitation, and possible use cases.