CREDOS: Classification using ripple down structure (a case for rare classes)
Abstract
Ripple down rules (RDRs) are commonly used by the expert systems community because they make knowledge bases easy to use and efficient to maintain. We observe that RDRs offer a unique tree-based representation that generalizes the decision tree and disjunctive normal form (DNF) rule-based models, and specializes a generic form of the PNrule model. In this paper, we explore their use for learning predictive classifier models. Such models require to have a generalization capability, most commonly achieved with the help of pruning methods. Existing RDR induction algorithms are developed to build an initial knowledge base that will be used and modified by humans to explain every case correctly. They do not look at RDR as a predictive model, and hence offer very little measures against over-fitting. Existing pruning strategies developed by the data mining community cannot be directly used for pruning a RDR structure because of the uniqueness of the structure and the prediction process. In this paper, we propose a novel induction algorithm CREDOS. The key characteristic of CREDOS is its generic pruning framework. We provide a specific instantiation of it based on the minimum description length (MDL) principle. Using real-world datasets requiring prediction of rare classes, we compare CREDOS to other state-of-the-art algorithms. It exhibits significantly better or comparable performance, especially in predicting a wide variety of rarely occurring events.