1999 Volume 14 Issue 1 Pages 139-147
This paper presents an evaluation method for discovering probabilistic if-then rules with high reliability from data sets. The discovery of probabilistic if-then rules, each of which is a restricted form of a characteristic production rule, is well motivated by various useful applications such as the semantic query optimization and the automatic development of a knowledge-base. In a discovery algorithm, a production rule is evaluated according to its generality and its accuracy since these are widely accepted as criteria in inductive learning. Here, reliability evaluation for these criteria is mandatory in distinguishing reliable rules from unreliable patterns without annoying the users. However, previous discovery approaches for characteristic rules have either ignored the reliability evaluation or have only evaluated the reliability of generality. Consequently, they tend to discover a huge number of rules, some of which are unreliable in their accuracies. In order to circumvent these difficulties we propose an approach based on a simultaneous estimation. Our approach discovers, based on the normal approximations of the multinomial distributions, the rules which exceed the pre-specified thresholds for generality and accuracy with high reliability. A novel pruning method is employed for improving the time efficiency without changing the discovery outcome. The proposed approach has been validated experimentally using 21 benchmark data sets in the machine learning community.