How to Convert Data.Frame to Transactions for Arules

R (arules) Convert dataframe into transactions and remove NA

Ogustari is right. Here is the complete code that also handles the transaction IDs.

library("arules")
library("dplyr") ### for dbl_df
df <- structure(list(Transaction_ID = c("A001", "A002", "A003", "A004", "A005", "A006"),
Fruits = c(NA, "Apple", "Orange", NA, "Pear", "Grape"),
Vegetables = c(NA, NA, NA, "Potato", NA, "Yam"),
Personal = c("ToothP", "ToothP", NA, "ToothB", "ToothB", NA),
Drink = c("Coff", NA, "Coff", "Milk", "Milk", "Coff"),
Other = c(NA, NA, NA, NA, "Promo", NA)),
.Names = c("Transaction_ID", "Fruits", "Vegetables", "Personal", "Drink", "Other"),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -6L))

### remove transaction IDs
tid <- as.character(df[["Transaction_ID"]])
df <- df[,-1]

### make all columns factors
for(i in 1:ncol(df)) df[[i]] <- as.factor(df[[i]])

trans <- as(df, "transactions")

### set transactionIDs
transactionInfo(trans)[["transactionID"]] <- tid

inspect(trans)

items transactionID
[1] {Personal=ToothP,Drink=Coff} A001
[2] {Personal=ToothP} A002
[3] {Drink=Coff} A003
[4] {Vegetables=Potato,Personal=ToothB,Drink=Milk} A004
[5] {Personal=ToothB,Drink=Milk,Other=Promo} A005
[6] {Vegetables=Yam,Drink=Coff} A006

Correctly convert data.frame to transactions for arules

We may need to split by the 'data' column and do the unlist

df_trans <- as(setNames(lapply(split(noticias_json[-3],
noticias_json$data), unlist), NULL), "transactions")

inspect(df_trans)
# items
#[1] {icarai,
# trafico de drogas}
#[2] {danilo passos,
# porte ilegal de armas,
# roubo,
# serra verde,
# trafico de drogas}

data

noticias_json <- structure(list(bairro = structure(list("icarai", 
c("danilo passos",
"serra verde")), class = "AsIs"), crime = structure(list("trafico de drogas",
c("trafico de drogas", "porte ilegal de armas", "roubo")), class = "AsIs"),
data = c("01-02-2016", "31-02-2016")), .Names = c("bairro",
"crime", "data"), row.names = c(NA, -2L), class = "data.frame")

How to convert a data frame to arules' transaction object

Here is what I tried. I think you need to manipulate your data and create lists. First, I created transaction ID just in case. Then, I transformed the data to a long-format data frame. By this time, all products stay in one column. I removed all rows that have NA. Then, I converted products to factor. For each group (transaction id), I created list containing all products. x has a column called whatever. This is the list you want to use to create a transaction object.

library(tidyverse)
library(arules)

mutate(mydata, transaction_id = 1:n()) %>%
pivot_longer(cols = contains("Item"), names_to = "item", values_to = "product") %>%
filter(complete.cases(product)) %>%
mutate(product = factor(product)) %>%
group_by(transaction_id) %>%
summarize(whatever = list(product)) -> x

# Assign transaction ID as name to whatever
names(x$whatever) <- x$transaction_id

$`1`
[1] lipstick Bronzer Mascara
Levels: Bronzer Eyeliner Eyeshadow Lip gloss lipstick Mascara Nail varnish Powder Remover

$`2`
[1] Eyeshadow lipstick
Levels: Bronzer Eyeliner Eyeshadow Lip gloss lipstick Mascara Nail varnish Powder Remover

$`3`
[1] Powder Remover
Levels: Bronzer Eyeliner Eyeshadow Lip gloss lipstick Mascara Nail varnish Powder Remover

$`4`
[1] Nail varnish Lip gloss Eyeliner
Levels: Bronzer Eyeliner Eyeshadow Lip gloss lipstick Mascara Nail varnish Powder Remover

Finally, I created a transaction-class object.

mybasket <- as(x$whatever, "transactions")

> summary(mybasket)
transactions as itemMatrix in sparse format with
4 rows (elements/itemsets/transactions) and
9 columns (items) and a density of 0.2777778

most frequent items:
lipstick Bronzer Eyeliner Eyeshadow Lip gloss (Other)
2 1 1 1 1 4

element (itemset/transaction) length distribution:
sizes
2 3
2 2

Min. 1st Qu. Median Mean 3rd Qu. Max.
2.0 2.0 2.5 2.5 3.0 3.0

includes extended item information - examples:
labels
1 Bronzer
2 Eyeliner
3 Eyeshadow

includes extended transaction information - examples:
transactionID
1 1
2 2
3 3

DATA

mydata <- structure(list(Transaction = c("12/09/2001", "2/09/2001", "13/09/2002", 
"14/09/2003"), Item1 = c("lipstick", "Eyeshadow", "Powder", "Nail varnish"
), Item2 = c("Bronzer", "lipstick", "Remover", "Lip gloss"),
Item3 = c("Mascara", NA, NA, "Eyeliner")), row.names = c(NA,
-4L), class = c("tbl_df", "tbl", "data.frame"))

long dataframe to transactions for arules in R

Here is one way I do it, and find it to be faster. Idea is to create a wide data frame of 0/1 values, and then feed that to create transactions. Does not require any split.

library(dplyr)
library(tidyr)
library(arules)

df <- df %>%
select(TID, itemNO) %>%
distinct() %>%
mutate(value = 1) %>%
spread(itemNO, value, fill = 0)

itemMatrix <- as(as.matrix(df[, -1]), 'transactions')

Convert R data.frame column to Arules transactions

Have a look at the examples in ? transactions. You need a list with vectors of items (item labels) and not a data.frame.

items <- strsplit(as.character(a_df$Tags), ", ")
trans3 <- as(items, "transactions")

rules <- apriori(trans3, parameter = list(sup = 0.1, conf = 0.5, target="rules",minlen=1))
Apriori

Parameter specification:
confidence minval smax arem aval originalSupport maxtime support minlen maxlen
0.5 0.1 1 none FALSE TRUE 5 0.1 1 10
target ext
rules FALSE

Algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE

Absolute minimum support count: 0

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[22 item(s), 7 transaction(s)] done [0.00s].
sorting and recoding items ... [22 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 5 done [0.00s].
writing ... [198 rule(s)] done [0.00s].
creating S4 object ... done [0.00s].

correct converting dataframe into transactions for arules in R

This is because the data is comma delimited when downloaded, and in g=read.csv("g.csv",sep=";"), you are splitting the data on a semi-colon. You should get desired output if you remove sep = ";" from your definition of g.

See the following, which defines sep as ;:

> trans <-  read.transactions("~/Downloads/groceries.csv", format = 'basket', sep = ';')
> str(trans)
Formal class 'transactions' [package "arules"] with 3 slots
..@ data :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
.. .. ..@ i : int [1:9835] 1265 6162 6377 4043 3585 6475 4431 3535 4401 6490 ...
.. .. ..@ p : int [1:9836] 0 1 2 3 4 5 6 7 8 9 ...
.. .. ..@ Dim : int [1:2] 7011 9835
.. .. ..@ Dimnames:List of 2
.. .. .. ..$ : NULL
.. .. .. ..$ : NULL
.. .. ..@ factors : list()
..@ itemInfo :'data.frame': 7011 obs. of 1 variable:
.. ..$ labels: chr [1:7011] "abrasive cleaner" "abrasive cleaner,napkins" "artif. sweetener" "artif. sweetener,coffee" ...
..@ itemsetInfo:'data.frame': 0 obs. of 0 variables

And this, which defines sep as ,:

> trans <-  read.transactions("~/Downloads/groceries.csv", format = 'basket', sep = ',')
> str(trans)
Formal class 'transactions' [package "arules"] with 3 slots
..@ data :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
.. .. ..@ i : int [1:43367] 29 88 118 132 33 157 167 166 38 91 ...
.. .. ..@ p : int [1:9836] 0 4 7 8 12 16 21 22 27 28 ...
.. .. ..@ Dim : int [1:2] 169 9835
.. .. ..@ Dimnames:List of 2
.. .. .. ..$ : NULL
.. .. .. ..$ : NULL
.. .. ..@ factors : list()
..@ itemInfo :'data.frame': 169 obs. of 1 variable:
.. ..$ labels: chr [1:169] "abrasive cleaner" "artif. sweetener" "baby cosmetics" "baby food" ...
..@ itemsetInfo:'data.frame': 0 obs. of 0 variables


Related Topics



Leave a reply



Submit