How to prep transaction data into basket for arules
Take a look at the help page for the "transactions" data type for examples on how to get your data in:
library(arules)
?transactions
For your type, you want to split
by Order, then use as
to get it into a transactions list:
trans <- as(split(data[,"Part"], data[,"Order"]), "transactions")
inspect(trans)
items transactionID
1 {A,B,G} 1
2 {R} 2
3 {A,B} 3
4 {E} 4
5 {Y} 5
6 {A,B,F,V} 6
How to prepare transaction data for arules
Apparently the unique(s1) is causing some problem to your coding. Is it required?
I'd managed to create the transaction just by hashing out that line.
sales <- structure(list(sku = c(207426L, 207422L, 207424L, 9793L, 33186L,
72406L), product_id = c(15729L, 15725L, 15727L, 15999L, 15983L,
15992L), item_id = 1:6, order_id = c(1L, 1L, 1L, 2L, 2L, 2L)),
.Names = c("sku", "product_id", "item_id", "order_id"),
class = "data.frame", row.names = c(NA, -6L))
s1 <- split(sales$product_id, sales$order_id)
#s1 <- unique(s1)
tr <- as(s1, "transactions")
tr
transactions in sparse format with
2 transactions (rows) and
6 items (columns)
If unique is really required, run this instead:
s1 <- lapply(s1, unique)
R arules preparing dataset for transactions
I would do it this way (following the examples in the manual page for transactions):
data_list <- split(data$Product, paste(data$OrderDate, data$Customer))
trans <- as(data_list, "transactions")
inspect(trans)
items transactionID
[1] {Milk} 1-Oct John
[2] {Bread,Eggs} 2-Oct John
[3] {Butter,Eggs,Milk} 2-Oct Tom
[4] {Bread,Butter,Eggs,Wine} 3-Oct Sally
itemFrequencyPlot(trans, topN = 5)
Hope this helps!
correct converting dataframe into transactions for arules in R
This is because the data is comma delimited when downloaded, and in g=read.csv("g.csv",sep=";")
, you are splitting the data on a semi-colon. You should get desired output if you remove sep = ";"
from your definition of g
.
See the following, which defines sep as ;
:
> trans <- read.transactions("~/Downloads/groceries.csv", format = 'basket', sep = ';')
> str(trans)
Formal class 'transactions' [package "arules"] with 3 slots
..@ data :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
.. .. ..@ i : int [1:9835] 1265 6162 6377 4043 3585 6475 4431 3535 4401 6490 ...
.. .. ..@ p : int [1:9836] 0 1 2 3 4 5 6 7 8 9 ...
.. .. ..@ Dim : int [1:2] 7011 9835
.. .. ..@ Dimnames:List of 2
.. .. .. ..$ : NULL
.. .. .. ..$ : NULL
.. .. ..@ factors : list()
..@ itemInfo :'data.frame': 7011 obs. of 1 variable:
.. ..$ labels: chr [1:7011] "abrasive cleaner" "abrasive cleaner,napkins" "artif. sweetener" "artif. sweetener,coffee" ...
..@ itemsetInfo:'data.frame': 0 obs. of 0 variables
And this, which defines sep as ,
:
> trans <- read.transactions("~/Downloads/groceries.csv", format = 'basket', sep = ',')
> str(trans)
Formal class 'transactions' [package "arules"] with 3 slots
..@ data :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
.. .. ..@ i : int [1:43367] 29 88 118 132 33 157 167 166 38 91 ...
.. .. ..@ p : int [1:9836] 0 4 7 8 12 16 21 22 27 28 ...
.. .. ..@ Dim : int [1:2] 169 9835
.. .. ..@ Dimnames:List of 2
.. .. .. ..$ : NULL
.. .. .. ..$ : NULL
.. .. ..@ factors : list()
..@ itemInfo :'data.frame': 169 obs. of 1 variable:
.. ..$ labels: chr [1:169] "abrasive cleaner" "artif. sweetener" "baby cosmetics" "baby food" ...
..@ itemsetInfo:'data.frame': 0 obs. of 0 variables
How to load transaction (basket) data in RapidMiner for association rule?
I finally understood what you meant - sorry I was being slow. This can be done using operators from the Text Processing Extension. You have to install this from the RapidMiner repository. Once you have you can try this process.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="7.0.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.0.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_csv" compatibility="7.0.000" expanded="true" height="68" name="Read CSV" width="90" x="246" y="85">
<parameter key="csv_file" value="C:\Temp\is.txt"/>
<parameter key="column_separators" value="\r\n"/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations"/>
<parameter key="encoding" value="windows-1252"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="att1.true.polynominal.attribute"/>
</list>
</operator>
<operator activated="true" class="nominal_to_text" compatibility="7.0.000" expanded="true" height="82" name="Nominal to Text" width="90" x="380" y="85"/>
<operator activated="true" class="text:process_document_from_data" compatibility="7.0.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="514" y="85">
<parameter key="vector_creation" value="Term Occurrences"/>
<list key="specify_weights"/>
<process expanded="true">
<operator activated="true" class="text:tokenize" compatibility="7.0.000" expanded="true" height="68" name="Tokenize" width="90" x="45" y="34">
<parameter key="mode" value="specify characters"/>
<parameter key="characters" value=","/>
</operator>
<connect from_port="document" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<connect from_op="Read CSV" from_port="output" to_op="Nominal to Text" to_port="example set input"/>
<connect from_op="Nominal to Text" from_port="example set output" to_op="Process Documents from Data" to_port="example set"/>
<connect from_op="Process Documents from Data" from_port="example set" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
The trick is to use Read CSV
to read the original file in but use end of line as the delimiter. This reads the entire line in as a polynominal attribute. From there, you have to convert this to text so that the text processing operators can do their work. The Process Documents from Data
operator is then used to make the final example set. The important point is to use the Tokenize
operator to split the lines into words separated by commas.
R (arules) Convert dataframe into transactions and remove NA
Ogustari is right. Here is the complete code that also handles the transaction IDs.
library("arules")
library("dplyr") ### for dbl_df
df <- structure(list(Transaction_ID = c("A001", "A002", "A003", "A004", "A005", "A006"),
Fruits = c(NA, "Apple", "Orange", NA, "Pear", "Grape"),
Vegetables = c(NA, NA, NA, "Potato", NA, "Yam"),
Personal = c("ToothP", "ToothP", NA, "ToothB", "ToothB", NA),
Drink = c("Coff", NA, "Coff", "Milk", "Milk", "Coff"),
Other = c(NA, NA, NA, NA, "Promo", NA)),
.Names = c("Transaction_ID", "Fruits", "Vegetables", "Personal", "Drink", "Other"),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -6L))
### remove transaction IDs
tid <- as.character(df[["Transaction_ID"]])
df <- df[,-1]
### make all columns factors
for(i in 1:ncol(df)) df[[i]] <- as.factor(df[[i]])
trans <- as(df, "transactions")
### set transactionIDs
transactionInfo(trans)[["transactionID"]] <- tid
inspect(trans)
items transactionID
[1] {Personal=ToothP,Drink=Coff} A001
[2] {Personal=ToothP} A002
[3] {Drink=Coff} A003
[4] {Vegetables=Potato,Personal=ToothB,Drink=Milk} A004
[5] {Personal=ToothB,Drink=Milk,Other=Promo} A005
[6] {Vegetables=Yam,Drink=Coff} A006
Related Topics
Repeat Vector to Fill Down Column in Data Frame
Knitr Compile Problems with Rstudio (Windows)
Can't Read an .Rdata Fileinput
Back-To-Back Barplot with Independent Axes R
R Data.Table Fread Command:How to Read Large Files with Irregular Separators
Draw Histograms Per Row Over Multiple Columns in R
How to Calculate a Table of Pairwise Counts from Long-Form Data Frame
Ggplot2 Scale_X_Log10() Destroys/Doesn't Apply for Function Plotted via Stat_Function()
Tricks to Override Plot.Factor
Merge Plm Fitted Values to Dataset
How to Make Shiny's Input$Var Consumable for Dplyr::Summarise()
How to Select Rows According to Column Value Conditions
R: Convert Date from Character to Datetime
Filling in a New Column Based on a Condition in a Data Frame
Sample Function Gives Different Result in Console and in Knitted Document When Seed Is Set