DATA MINING
Desktop Survival Guide by Graham Williams |
|||||
The simplest association analysis is often referred to as market basket analysis. Within Rattle this is enabled when the Baskets button is checked. In this case, the data is thought of as representing shopping baskets (or any other type of collection of items, such as a basket of medical tests, a basket of medicines prescribed to a patient, a basket of stocks held by an investor, and so on). Each basket has a unique identifier, and the variable specified as an Ident variable in the Variables tab is taken as the identifier of a shopping basket. The contents of the basket are then the items contained in the column of data identified as the target variable. For market basket analysis, these are the only two variables used.
To illustrate market basket analysis with Rattle, we will use a
very simple dataset consisting of the DVD movies purchased by
customers. Suppose the data is stored in the file dvdtrans.csv
and consists of the following:
ID,Item 1,Sixth Sense 1,LOTR1 1,Harry Potter1 1,Green Mile 1,LOTR2 2,Gladiator 2,Patriot 2,Braveheart 3,LOTR1 3,LOTR2 4,Gladiator 4,Patriot 4,Sixth Sense 5,Gladiator 5,Patriot 5,Sixth Sense 6,Gladiator 6,Patriot 6,Sixth Sense 7,Harry Potter1 7,Harry Potter2 8,Gladiator 8,Patriot 9,Gladiator 9,Patriot 9,Sixth Sense 10,Sixth Sense 10,LOTR 10,Galdiator 10,Green Mile |
The lower part of the same textview contains information about the running of the algorithm:
Copyright © 2004-2006 Graham.Williams@togaware.com Support further development through the purchase of the PDF version of the book.