Data Mining Algorithms In R/Frequent Pattern Mining/The Eclat Algorithm

Introduction

The Eclat Algorithm
The Eclat algorithm is used to perform itemset mining. Itemset mining let us find frequent patterns in data like if a consumer buys milk, he also buys bread. This type of pattern is called association rules and is used in many application domains.

The basic idea for the eclat algorithm is use tidset intersections to compute the support of a candidate itemset avoiding the generation of subsets that does not exist in the prefix tree. It was originally proposed by Zaki, Parthasarathy et al.

Algorithm
The Eclat algorithm is defined recursively. The initial call uses all the single items with their tidsets. In each recursive call, the function IntersectTidsets verifies each itemset-tidset pair $$\left\langle {X,t(X)} \right\rangle$$ with all the others pairs $$\left\langle {Y,t(Y)} \right\rangle$$ to generate new candidates $$N_{XY}$$. If the new candidate is frequent, it is added to the set $$P_{X}$$. Then, recursively, it finds all the frequent itemsets in the $$X$$ branch. The algorithm searches in a DFS manner to find all the frequent sets.

Implementation
The eclat algorithm can be found in the arule package of R system.

* R package: arules * Method: eclat * Documentation : arules package

Usage

eclat(data, parameter = NULL, control = NULL)

Arguments


 * data :object of class transactions or any data structure which can be coerced into transactions (e.g., binary matrix, data.frame).


 * parameter :object of class ECparameter or named list (default values are: support 0.1 and maxlen 5)


 * control :object of class ECcontrol or named list for algorithmic controls.

Value

Returns an object of class itemsets

Example

Visualization
The arules package implements some visualization methods for itemsets, which are the return type for the eclat algorithm. Here are some examples:

Example 1

Example 2

Use Case
To see some real example of the use of the Eclat algorithm it will be used some data from the northwind database. The northwind database is freely available for download and represents data from an enterprise. In this example it will be used the table order details from the database. The order details table is used to relate the orders with products (in a n to n relationship). The Eclat algorithm will be used to find frequent patterns from this data to see if there are any products that are bought together.

Scenario
Given the data from the order details table from the northwind database, find all the frequent itemsets with support = 0.1 and length of at least 2.

Input Data
The order details table has the fields:


 * ID: primary key
 * Order ID: foreign key from table Orders
 * Product ID:foreign key from table Products
 * Quantity: the quantity bought
 * Discount: the discount offered
 * Unit Price: the unit price of the product

To use the data, some pre-processing is necessary. The table may have many rows that belongs to the same order, so the table was converted in a way that all the rows for one order became only one row in the new table containing the product id's of the products belonging to that order. The fields ID, order id, quantity, discount and unit price was discarded. The data was saved in a txt file called northwind-orders.txt. The file was scripted in a way ready to be loaded as a list object in R.

Implementation
To run the example the package arules need to be loaded in R.

First, the data is loaded in a list object in R

Second, the eclat algorithm is used.

parameter specification: tidLists support minlen maxlen           target   ext TRUE    0.1      2      5 frequent itemsets FALSE

algorithmic control: sparse sort verbose 7  -2    TRUE

eclat - find frequent item sets with the eclat algorithm version 2.6 (2004.08.16)        (c) 2002-2004   Christian Borgelt create itemset ... set transactions ...[78 item(s), 1041 transaction(s)] done [0.00s]. sorting and recoding items ... [3 item(s)] done [0.00s]. creating bit matrix ... [3 row(s), 1041 column(s)] done [0.00s]. writing ... [4 set(s)] done [0.00s]. Creating S4 object ... done [0.00s].

Output Data
The itemsets object holds the output of the execution of the eclat algorithm. As can be seen above, 4 sets was generated. To see the results it can be used:

items  support 1 {11,             42,              72}  0.1940442 2 {42,              72}  0.1940442 3 {11,              42}  0.1940442 4 {11,              72}  0.1959654

Analysis
As can be seen above, there are 4 frequent itemsets as result of the eclat algorithm. This output was induced by the replication of the transaction {11, 42, 72} many times in the data. This result shows that the tuples {11,42,72},{42,72} and {11,42} has a support of 19,40%; and the tuple {11,72} has a support of 19,60%.

The product id's 11, 42 and 72 represents the products Queso Cabrales, Singaporean Hokkien Fried Mee and Mozzarella di Giovanni, respectively. So, the output of the eclat algorithm suggests a strong frequent shop pattern of buying this items together.

PPV, PrePost, and FIN Algorithm
These three algorithms were proposed by Deng et al , and are based on three novel data structures called Node-list , N-list , and Nodeset respectively for facilitating the mining process of frequent itemsets. They are sets of nodes in a FP-tree with each node encoding with pre-order traversal and post-order traversal. Compared with Node-lists, N-lists and Nodesets are more efficient. This causes the efficiency of PrePost and FIN is higher than that of PPV. See  for more details.