Data Mining Algorithms In R/Classification/JRip

Synopsis
This class implements a propositional rule learner, Repeated Incremental Pruning to Produce Error Reduction (RIPPER), which was proposed by William W. Cohen as an optimized version of IREP. It is based in association rules with reduced error pruning (REP), a very common and effective technique found in decision tree algorithms. In REP for rules algorithms, the training data is split into a growing set and a pruning set. First, an initial rule set is formed that over ts the growing set, using some heuristic method. This overlarge rule set is then repeatedly simplified by applying one of a set of pruning operators typical pruning operators would be to delete any single condition or any single rule. At each stage of simplification, the pruning operator chosen is the one that yields the greatest reduction of error on the pruning set. Simplification ends when applying any pruning operator would increase error on the pruning set.

The algorithm is briefly described as follows: Initialize RS = {}, and for each class from the less prevalent one to the more frequent one, DO:

1. Building stage:
Repeat 1.1 and 1.2 until the description length (DL) of the rule set and examples is 64 bits greater than the smallest DL met so far, or there are no positive examples, or the error rate >= 50%.

1.1. Grow phase:
Grow one rule by greedily adding antecedents (or conditions) to the rule until the rule is perfect (i.e. 100% accurate). The procedure tries every possible value of each attribute and selects the condition with highest information gain: p(log(p/t)-log(P/T)).

1.2. Prune phase:
Incrementally prune each rule and allow the pruning of any final sequences of the antecedents;The pruning metric is (p-n)/(p+n) – but it's actually 2p/(p+n) -1, so in this implementation we simply use p/(p+n) (actually (p+1)/(p+n+2), thus if p+n is 0, it's 0.5).

2. Optimization stage:
after generating the initial rule set {Ri}, generate and prune two variants of each rule Ri from randomized data using procedure 1.1 and 1.2. But one variant is generated from an empty rule while the other is generated by greedily adding antecedents to the original rule. Moreover, the pruning metric used here is (TP+TN)/(P+N).Then the smallest possible DL for each variant and the original rule is computed. The variant with the minimal DL is selected as the final representative of Ri in the rule set. After all the rules in {Ri} have been examined and if there are still residual positives, more rules are generated based on the residual positives using Building Stage again. 3. Delete the rules from the rule set that would increase the DL of the whole rule set if it were in it. and add resultant rule set to RS. ENDDO Note that there seem to be 2 bugs in the original ripper program that would affect the rule set size and accuracy slightly. This implementation avoids these bugs and thus is a little bit different from Cohen's original implementation. Even after fixing the bugs, since the order of classes with the same frequency is not defined in ripper, there still seems to be some trivial difference between this implementation and the original ripper, especially for audiology data in UCI repository, where there are lots of classes of few instances. Details please see:


 * 1) William W. Cohen: Fast Effective Rule Induction. In: Twelfth International Conference on Machine Learning, 115-123, 1995.

Installation
The caret package can be installed by using the following command on R's command-line:

install.packages("caret", dependencies = TRUE)

The above command shall recursively download and install all packages that caret depend to along with fpc itself.

Example
The example in this section will illustrate the carets's JRip usage on the IRIS database:

>library(caret) >library(RWeka) >data(iris) >TrainData <- iris[,1:4] >TrainClasses <- iris[,5] >jripFit <- train(TrainData, TrainClasses,method = "JRip")

Dataset
The Iris dataset contains 150 instances, corresponding to three equally-frequent species of iris plant (Iris setosa, Iris versicolour, and Iris virginica). An Iris versicolor is shown below, courtesy of Wikimedia Commons.



Each instance contains four attributes:sepal length in cm, sepal width in cm, petal length in cm, and petal width in cm. The next picture shows each attribute plotted against the others, with the different classes in color.



Execution and Results
First of all, we need to specify which base we are going to use:

> data(iris) > summary(iris) Sepal.Length   Sepal.Width     Petal.Length    Petal.Width Min. :4.300  Min. :2.000  Min. :1.000  Min.   :0.100 1st Qu.:5.100  1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300 Median :5.800  Median :3.000   Median :4.350   Median :1.300 Mean  :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199 3rd Qu.:6.400  3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800 Max. :7.900  Max. :4.400  Max. :6.900  Max. :2.500        Species setosa   :50 versicolor:50 virginica :50

After that, we are ready to create a Naïve Bayes model to the dataset using the first 4 columns to predict the fifth.

>data(iris) >varIndex <- 1:numSamples > >TrainData <- iris[,1:4] >TrainClasses <- iris[,5] >jripFit <- train(TrainData, TrainClasses,method = "JRip",preProcess = c("center", "scale"),tuneLength = 10,trControl = trainControl(method = "cv"))

Output
Loading required package: class Attaching package: 'class' The following object(s) are masked from 'package:reshape': condense Fitting: NumOpt=1 Fitting: NumOpt=2 Fitting: NumOpt=3 Fitting: NumOpt=4 Fitting: NumOpt=5 Fitting: NumOpt=6 Fitting: NumOpt=7 Fitting: NumOpt=8 Fitting: NumOpt=9 Fitting: NumOpt=10 Aggregating results Selecting tuning parameters Fitting model on full training set

Result
> jripFit Call: train.default(x = TrainData, y = TrainClasses, method = "JRip",     preProcess = c("center", "scale"), trControl = trainControl(method = "cv"),      tuneLength = 10) 150 samples 4 predictors Pre-processing: centered, scaled Resampling: Cross-Validation (10 fold) Summary of sample sizes: 135, 135, 135, 135, 135, 135, ... Resampling results across tuning parameters: NumOpt Accuracy  Kappa  Accuracy SD  Kappa SD  Selected 1      0.953     0.93   0.045        0.0675               2       0.953     0.93   0.045        0.0675    *          3       0.933     0.9    0.0444       0.0667               4       0.94      0.91   0.0584       0.0876               5       0.94      0.91   0.0584       0.0876               6       0.94      0.91   0.0584       0.0876               7       0.94      0.91   0.0584       0.0876               8       0.94      0.91   0.0584       0.0876               9       0.94      0.91   0.0584       0.0876               10      0.94      0.91   0.0584       0.0876             Accuracy was used to select the optimal model using the largest value. The final value used for the model was NumOpt = 2.

The caret package ran the training tuning the NumOpt JRip parameter from 1 to 10 and chose the best performance which is NumOpt=2 with a 95.3% accuracy. If some other algorithm was chosen, other algorithm parameter would be tuned.

If we plot the results we have a plot of the parameter choosing accuracy:

>plot(jripFit)