How to Convert Data.Frame to Transactions for Arules

R (arules) Convert dataframe into transactions and remove NA

Ogustari is right. Here is the complete code that also handles the transaction IDs.

library("arules")
library("dplyr")  ### for dbl_df
df <- structure(list(Transaction_ID = c("A001", "A002", "A003", "A004", "A005", "A006"), 
  Fruits = c(NA, "Apple", "Orange", NA, "Pear", "Grape"), 
  Vegetables = c(NA, NA, NA, "Potato", NA, "Yam"), 
  Personal = c("ToothP", "ToothP", NA, "ToothB", "ToothB", NA), 
  Drink = c("Coff", NA, "Coff", "Milk", "Milk", "Coff"), 
  Other = c(NA, NA, NA, NA, "Promo", NA)), 
  .Names = c("Transaction_ID", "Fruits", "Vegetables", "Personal", "Drink", "Other"), 
  class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -6L))

### remove transaction IDs
tid <- as.character(df[["Transaction_ID"]])
df <- df[,-1]

### make all columns factors
for(i in 1:ncol(df)) df[[i]] <- as.factor(df[[i]])

trans <- as(df, "transactions")

### set transactionIDs
transactionInfo(trans)[["transactionID"]] <- tid

inspect(trans)

   items                                          transactionID
[1] {Personal=ToothP,Drink=Coff}                   A001         
[2] {Personal=ToothP}                              A002         
[3] {Drink=Coff}                                   A003         
[4] {Vegetables=Potato,Personal=ToothB,Drink=Milk} A004         
[5] {Personal=ToothB,Drink=Milk,Other=Promo}       A005         
[6] {Vegetables=Yam,Drink=Coff}                    A006

Correctly convert data.frame to transactions for arules

We may need to split by the 'data' column and do the unlist

df_trans <- as(setNames(lapply(split(noticias_json[-3],
              noticias_json$data), unlist), NULL), "transactions")

inspect(df_trans)
#    items                  
#[1] {icarai,               
#     trafico de drogas}    
#[2] {danilo passos,        
#     porte ilegal de armas,
#     roubo,                
#     serra verde,          
#     trafico de drogas}

data

noticias_json <- structure(list(bairro = structure(list("icarai", 
   c("danilo passos", 
"serra verde")), class = "AsIs"), crime = structure(list("trafico de drogas", 
    c("trafico de drogas", "porte ilegal de armas", "roubo")), class = "AsIs"), 
    data = c("01-02-2016", "31-02-2016")), .Names = c("bairro", 
"crime", "data"), row.names = c(NA, -2L), class = "data.frame")

How to convert a data frame to arules' transaction object

Here is what I tried. I think you need to manipulate your data and create lists. First, I created transaction ID just in case. Then, I transformed the data to a long-format data frame. By this time, all products stay in one column. I removed all rows that have NA. Then, I converted products to factor. For each group (transaction id), I created list containing all products. x has a column called whatever. This is the list you want to use to create a transaction object.

library(tidyverse)
library(arules)

mutate(mydata, transaction_id = 1:n()) %>% 
pivot_longer(cols = contains("Item"), names_to = "item", values_to = "product") %>% 
filter(complete.cases(product)) %>% 
mutate(product = factor(product)) %>% 
group_by(transaction_id) %>% 
summarize(whatever = list(product)) -> x

# Assign transaction ID as name to whatever
names(x$whatever) <- x$transaction_id

$`1`
[1] lipstick Bronzer  Mascara 
Levels: Bronzer Eyeliner Eyeshadow Lip gloss lipstick Mascara Nail varnish Powder Remover

$`2`
[1] Eyeshadow lipstick 
Levels: Bronzer Eyeliner Eyeshadow Lip gloss lipstick Mascara Nail varnish Powder Remover

$`3`
[1] Powder  Remover
Levels: Bronzer Eyeliner Eyeshadow Lip gloss lipstick Mascara Nail varnish Powder Remover

$`4`
[1] Nail varnish Lip gloss    Eyeliner    
Levels: Bronzer Eyeliner Eyeshadow Lip gloss lipstick Mascara Nail varnish Powder Remover

Finally, I created a transaction-class object.

mybasket <- as(x$whatever, "transactions")

> summary(mybasket)
transactions as itemMatrix in sparse format with
 4 rows (elements/itemsets/transactions) and
 9 columns (items) and a density of 0.2777778 

most frequent items:
 lipstick   Bronzer  Eyeliner Eyeshadow Lip gloss   (Other) 
        2         1         1         1         1         4 

element (itemset/transaction) length distribution:
sizes
2 3 
2 2 

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   2.0     2.0     2.5     2.5     3.0     3.0 

includes extended item information - examples:
     labels
1   Bronzer
2  Eyeliner
3 Eyeshadow

includes extended transaction information - examples:
  transactionID
1             1
2             2
3             3

DATA

mydata <- structure(list(Transaction = c("12/09/2001", "2/09/2001", "13/09/2002", 
"14/09/2003"), Item1 = c("lipstick", "Eyeshadow", "Powder", "Nail varnish"
), Item2 = c("Bronzer", "lipstick", "Remover", "Lip gloss"), 
Item3 = c("Mascara", NA, NA, "Eyeliner")), row.names = c(NA, 
-4L), class = c("tbl_df", "tbl", "data.frame"))

long dataframe to transactions for arules in R

Here is one way I do it, and find it to be faster. Idea is to create a wide data frame of 0/1 values, and then feed that to create transactions. Does not require any split.

library(dplyr)
library(tidyr)
library(arules)

df <- df %>%
  select(TID, itemNO) %>%
  distinct() %>%
  mutate(value = 1) %>%
  spread(itemNO, value, fill = 0)

itemMatrix <- as(as.matrix(df[, -1]), 'transactions')

Convert R data.frame column to Arules transactions

Have a look at the examples in ? transactions. You need a list with vectors of items (item labels) and not a data.frame.

items <- strsplit(as.character(a_df$Tags), ", ")
trans3 <- as(items, "transactions")

rules <- apriori(trans3, parameter = list(sup = 0.1, conf = 0.5, target="rules",minlen=1))
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen maxlen
        0.5    0.1    1 none FALSE            TRUE       5     0.1      1     10
 target   ext
  rules FALSE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 0 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[22 item(s), 7 transaction(s)] done [0.00s].
sorting and recoding items ... [22 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 5 done [0.00s].
writing ... [198 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].

correct converting dataframe into transactions for arules in R

This is because the data is comma delimited when downloaded, and in g=read.csv("g.csv",sep=";"), you are splitting the data on a semi-colon. You should get desired output if you remove sep = ";" from your definition of g.

See the following, which defines sep as ;:

> trans <-  read.transactions("~/Downloads/groceries.csv", format = 'basket', sep = ';')
> str(trans)
Formal class 'transactions' [package "arules"] with 3 slots
  ..@ data       :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
  .. .. ..@ i       : int [1:9835] 1265 6162 6377 4043 3585 6475 4431 3535 4401 6490 ...
  .. .. ..@ p       : int [1:9836] 0 1 2 3 4 5 6 7 8 9 ...
  .. .. ..@ Dim     : int [1:2] 7011 9835
  .. .. ..@ Dimnames:List of 2
  .. .. .. ..$ : NULL
  .. .. .. ..$ : NULL
  .. .. ..@ factors : list()
  ..@ itemInfo   :'data.frame': 7011 obs. of  1 variable:
  .. ..$ labels: chr [1:7011] "abrasive cleaner" "abrasive cleaner,napkins" "artif. sweetener" "artif. sweetener,coffee" ...
  ..@ itemsetInfo:'data.frame': 0 obs. of  0 variables

And this, which defines sep as ,:

> trans <-  read.transactions("~/Downloads/groceries.csv", format = 'basket', sep = ',')
> str(trans)
Formal class 'transactions' [package "arules"] with 3 slots
  ..@ data       :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
  .. .. ..@ i       : int [1:43367] 29 88 118 132 33 157 167 166 38 91 ...
  .. .. ..@ p       : int [1:9836] 0 4 7 8 12 16 21 22 27 28 ...
  .. .. ..@ Dim     : int [1:2] 169 9835
  .. .. ..@ Dimnames:List of 2
  .. .. .. ..$ : NULL
  .. .. .. ..$ : NULL
  .. .. ..@ factors : list()
  ..@ itemInfo   :'data.frame': 169 obs. of  1 variable:
  .. ..$ labels: chr [1:169] "abrasive cleaner" "artif. sweetener" "baby cosmetics" "baby food" ...
  ..@ itemsetInfo:'data.frame': 0 obs. of  0 variables